Training own model in opennlp - file

I am finding it difficult to create my own model openNLP.
Can any one tell me, how to own model.
How the training shouls be done.
What should be the input and where the output model file will get stored.

https://opennlp.apache.org/docs/1.5.3/manual/opennlp.html
This website is very useful, shows both in code, and using the OpenNLP application to train models for all different types, like entity extraction and part of speech etc.
I could give you some code examples in here, but the page is very clear to use.
Theory-wise:
Essentially you create a file which lists the stuff you want to train
eg.
Sport [whitespace] this is a page about football, rugby and stuff
Politics [whitespace] this is a page about tony blair being prime minister.
The format is described on the page above (each model expects a different format). once you have created this file, you run it through either the API or the opennlp application (via command line), and it generates a .bin file. Once you have this .bin file, you can load it into a model, and start using it (as per the api in the above website).

First you need to train the data with the required Entity.
Sentences should be separated with new line character (\n). Values should be separated from and tags with a space character.
Let's say you want to create medicine entity model, so data should be something like this:
<START:medicine> Augmentin-Duo <END> is a penicillin antibiotic that contains two medicines - <START:medicine> amoxicillin trihydrate <END> and
<START:medicine> potassium clavulanate <END>. They work together to kill certain types of bacteria and are used to treat certain types of bacterial infections.
You can refer a sample dataset for example. Training data should have at least 15000 sentences to get the better results.
Further you can use Opennlp TokenNameFinderTrainer.
Output file will be in the .bin format.
Here is the example: Writing a custom NameFinder model in OpenNLP
For more details, refer the Opennlp documentation

Perhaps this article will help you out. It describes how to do TokenNameFinder training from data extracted from Wikipedia...
nuxeo - blog - Mining Wikipedia with Hadoop and Pig for Natural Language Processing

Copy the data in data and run below code to get your own mymodel.bin .
Can refer for data=https://github.com/mccraigmccraig/opennlp/blob/master/src/test/resources/opennlp/tools/namefind/AnnotatedSentencesWithTypes.txt
public class Training {
static String onlpModelPath = "mymodel.bin";
// training data set
static String trainingDataFilePath = "data.txt";
public static void main(String[] args) throws IOException {
Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream = new PlainTextByLineStream(
new FileInputStream(trainingDataFilePath), charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(
lineStream);
TokenNameFinderModel model = null;
HashMap<String, Object> mp = new HashMap<String, Object>();
try {
// model = NameFinderME.train("en","drugs", sampleStream, Collections.<String,Object>emptyMap(),100,4) ;
model= NameFinderME.train("en", "drugs", sampleStream, Collections. emptyMap());
} finally {
sampleStream.close();
}
BufferedOutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(onlpModelPath));
model.serialize(modelOut);
} finally {
if (modelOut != null)
modelOut.close();
}
}
}

Related

query by object value inside array on firebase firestore [duplicate]

This is my structure of the firestore database:
Expected result: to get all the jobs, where in the experience array, the lang value is "Swift".
So as per this I should get first 2 documents. 3rd document does not have experience "Swift".
Query jobs = db.collection("Jobs").whereArrayContains("experience.lang","Swift");
jobs.get().addOnSuccessListener(new OnSuccessListener<QuerySnapshot>() {
#Override
public void onSuccess(QuerySnapshot queryDocumentSnapshots) {
//Always the queryDocumentSnapshots size is 0
}
});
Tried most of the answers but none worked out. Is there any way to query data in this structure? The docs only available for normal array. Not available for array of custom object.
Actually it is possible to perform such a query when having a database structure like yours. I have replicated your schema and here are document1, document2, and document3.
Note that you cannot query using partial (incomplete) data. You are using only the lang property to query, which is not correct. You should use an object that contains both properties, lang and years.
Seeing your screenshot, at first glance, the experience array is a list of HashMap objects. But here comes the nicest part, that list can be simply mapped into a list of custom objects. Let's try to map each object from the array to an object of type Experience. The model contains only two properties:
public class Experience {
public String lang, years;
public Experience() {}
public Experience(String lang, String years) {
this.lang = lang;
this.years = years;
}
}
I don't know how you named the class that represents a document, but I named it simply Job. To keep it simple, I have only used two properties:
public class Job {
public String name;
public List<Experience> experience;
//Other prooerties
public Job() {}
}
Now, to perform a search for all documents that contain in the array an object with the lang set to Swift, please follow the next steps. First, create a new object of the Experience class:
Experience firstExperience = new Experience("Swift", "1");
Now you can query like so:
CollectionReference jobsRef = rootRef.collection("Jobs");
jobsRef.whereArrayContains("experience", firstExperience).get().addOnCompleteListener(new OnCompleteListener<QuerySnapshot>() {
#Override
public void onComplete(#NonNull Task<QuerySnapshot> task) {
if (task.isSuccessful()) {
for (QueryDocumentSnapshot document : task.getResult()) {
Job job = document.toObject(Job.class);
Log.d(TAG, job.name);
}
} else {
Log.d(TAG, task.getException().getMessage());
}
}
});
The result in the logcat will be the name of document1 and document2:
firstJob
secondJob
And this is because only those two documents contain in the array an object where the lang is set to Swift.
You can also achieve the same result when using a Map:
Map<String, Object> firstExperience = new HashMap<>();
firstExperience.put("lang", "Swift");
firstExperience.put("years", "1");
So there is no need to duplicate data in this use-case. I have also written an article on the same topic
How to map an array of objects from Cloud Firestore to a List of objects?
Edit:
In your approach it provides the result only if expreience is "1" and lang is "Swift" right?
That's correct, it only searches for one element. However, if you need to query for more than that:
Experience firstExperience = new Experience("Swift", "1");
Experience secondExperience = new Experience("Swift", "4");
//Up to ten
We use another approach, which is actually very simple. I'm talking about Query's whereArrayContainsAny() method:
Creates and returns a new Query with the additional filter that documents must contain the specified field, the value must be an array, and that the array must contain at least one value from the provided list.
And in code should look like this:
jobsRef.whereArrayContainsAny("experience", Arrays.asList(firstExperience, secondExperience)).get().addOnCompleteListener(new OnCompleteListener<QuerySnapshot>() {
#Override
public void onComplete(#NonNull Task<QuerySnapshot> task) {
if (task.isSuccessful()) {
for (QueryDocumentSnapshot document : task.getResult()) {
Job job = document.toObject(Job.class);
Log.d(TAG, job.name);
}
} else {
Log.d(TAG, task.getException().getMessage());
}
}
});
The result in the logcat will be:
firstJob
secondJob
thirdJob
And this is because all three documents contain one or the other object.
Why am I talking about duplicating data in a document it's because the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. So storing duplicated data will only increase the change to reach the limit.
If i send null data of "exprience" and "swift" as "lang" will it be queried?
No, will not work.
Edit2:
whereArrayContainsAny() method works with max 10 objects. If you have 30, then you should save each query.get() of 10 objects into a Task object and then pass them one by one to the to the Tasks's whenAllSuccess(Task... tasks).
You can also pass them directly as a list to Tasks's whenAllSuccess(Collection> tasks) method.
With your current document structure, it's not possible to perform the query you want. Firestore does not allow queries for individual fields of objects in list fields.
What you would have to do is create an additional field in your document that is queryable. For example, you could create a list field with only the list of string languages that are part of the document. With this, you could use an array-contains query to find the documents where a language is mentioned at least once.
For the document shown in your screenshot, you would have a list field called "languages" with values ["Swift", "Kotlin"].

Create Class object through class name in Salesforce

I am new to Salesforce and want to code one requirement, I have an api name in string variable through which I want to create an object of that class.
For Eg. Object name is Account which is stored in string variable.
String var = 'Account'
Now I want to create object of 'Account'. In java it is possible by Class.forName('Account') but similar approach is not working in Salesforce Apex.
Can anyone please help me out in this.
Have a look at Type class. The documentation for it isn't terribly intuitive, perhaps you'll benefit more from the article that introduced it: https://developer.salesforce.com/blogs/developer-relations/2012/05/dynamic-apex-class-instantiation-in-summer-12.html
I'm using this in managed package code which inserts Chatter posts (feed items) only if the org has Chatter enabled:
sObject fItem = (sObject)System.Type.forName('FeedItem').newInstance();
fItem.put('ParentId', UserInfo.getUserId());
fItem.put('Body', 'Bla bla bla');
insert fItem;
(If I'd hardcode the FeedItem class & insert it directly it'd mark my package as "requires Chatter to run").
Alternative would be to build a JSON representation of your object (String with not only the type but also some field values). You could have a look at https://salesforce.stackexchange.com/questions/171926/how-to-deserialize-json-to-sobject
At the very least now you know what keywords & examples to search for :)
Edit:
Based on your code snippet - try something like this:
String typeName = 'List<Account>';
String content = '[{"attributes":{"type":"Account"},"Name":"Some account"},{"attributes":{"type":"Account"},"Name":"Another account"}]';
Type t = Type.forName(typeName);
List<sObject> parsed = (List<sObject>) JSON.deserialize(content, t);
System.debug(parsed);
System.debug(parsed[1].get('Name'));
// And if you need to write code "if it's accounts then do something special with them":
if(t == List<Account>.class){
List<Account> accs = (List<Account>) parsed;
accs[1].BillingCountry = 'USA';
}

Design pattern to process different file types in OOP

I need to process a big set of files that at the moment are all loaded into memory on a List.
i.e:
List(FileClass) Files;
Use:
Reader.Files; //List of files of the type
File Class has one attribute to match each FileInfo object property i.e: Name,CreateDate,etc.
Also has a List of Lines of the type (LineNumber, Data).
Now I need to create an logic to interpret these files. They all have different logic interpreteations and they will be loaded on to their correspondent Business Object.
i.e:
Model model = new Model()
.emp => Process => Employee Class
.ord => Process => Order Class
model.AddObject(emp);
model.AddObject(ord);
My question what is the best design pattern for a problem of this sort.
All I can think of is... something like this:
public ProcessFiles(List<Files> Files)
{
Model model = new Model()
var obj;
foreach(file in Files)
{
switch (File.GetExtension(file))
{
case "emp":
obj = BuildEmployee(file) //returns Employee class type
break;
case "ord":
obj = BuildOrder(file) //returns Order class type
break;
}
model.AddObject(obj);
}
}
Is there a better way to approach this?
This solution looks procedural to me, is there a better Object Oriented Approach to it?
Cheers
UPDATE:
I've come across a few options to solve this:
1)- Use of Partial classes for separation of concerns.
I have a data model that I don't want to mix File processing, Database use, etc (Single Responsibility)
DATA MODEL:
public partial class Employee
{
public int EmployeeID;
public string FirstName;
public string LastName;
public decimal Salary;
}
INTERPRETER/FILE PARSER:
This partial class defines the logic to parse .emp files.
// This portion of the partial class to separate Data Model
from File processing
public partial class Employee
{
public void ProcessFile(string FileName)
{
//Do processing
}
...
}
Intepreter Object
public class Interpreter : IInterpreter
{
foreach(file in Files)
{
switch (fileExtension)
{
case .emp
Employee obj= new Employee();
case .ord
Order obj = new Order(file);
}
obj.ProcessFile(File)
Model.AddObject(emp)
}
}
2)- Perhaps using some sort of Factory Pattern...
The input is the file with an extension type.
This drives the type of object to be created (i.e: Employee, Order, anything) and also the logic to parse this file. Any ideas?
Well, it seems that you want to vary the processing behaviour based on the file type. Behaviour and Type being keywords. Does any behavioural pattern suit your requirement?
Or is it that the object creation is driven by the input file type? Then creation and type become important keywords.
You might want to take a look at strategy and factory method patterns.
Here is something from the book Refactoring to Patterns:
The overuse of patterns tends to result from being patterns happy. We
are patterns happy when we become so enamored of patterns that we
simply must use them in our code. A patterns-happy programmer may work
hard to use patterns on a system just to get the experience of
implementing them or maybe to gain a reputation for writing really
good, complex code.
A programmer named Jason Tiscione, writing on SlashDot (see
http://developers.slashdot.org/comments.pl?sid=33602&cid=3636102),
perfectly caricatured patterns-happy code with the following version
of Hello World. ..... It is perhaps impossible to avoid being patterns
happy on the road to learning patterns. In fact, most of us learn by
making mistakes. I've been patterns happy on more than one occasion.
The true joy of patterns comes from using them wisely.

Displaying Mutable PostgreSQL Arrays in the NetBeans Master/Detail Sample Form using JPA 1.0

Some Background
I have a game database with a table called Games that has multiple attributes and one called Genres. The Genres attribute is defined as an integer[] in PostgreSQL. For the sake of simplicity, I'm not using any foreign key constraints, but essentially each integer in this array is a foreign key constraint on the id attribute in the Genres table. First time working with the NetBeans Master/Detail Sample Form and Java persistence and it's been working great so far except for 1 thing. I get this error when the program tries to display a column that has a 1-dimensional integer array. In this example, the value is {1, 11}.
Exception Description: The object [{1,11}], of class [class org.postgresql.jdbc3.Jdbc3Array], from mapping [oracle.toplink.essentials.mappings.DirectToFieldMapping[genres-->final.public.games.genres]] with descriptor [RelationalDescriptor(finalproject.Games --> [DatabaseTable(final.public.games)])], could not be converted to [class [B].
Exception [TOPLINK-3002] (Oracle TopLink Essentials - 2.0.1 (Build b09d-fcs (12/06/2007))): oracle.toplink.essentials.exceptions.ConversionException
My Research
From what I've been able to read, it looks like PostgreSQL arrays need something special done to them before you can display and edit them in this template. By default, the sample form uses TopLink Essentials (JPA 1.0) as its persistence library, but I can also use Hibernate (JPA 1.0).
Here is the code that needs to be changed in some way. From the Games.java file:
#Entity
#Table(name = "games", catalog = "final", schema = "public")
#NamedQueries({
// omitting named queries
#NamedQuery(name = "Games.findByGenres", query = "SELECT g FROM Games g WHERE g.genres = :genres")
})
public class Games implements Serializable {
#Transient
private PropertyChangeSupport changeSupport = new PropertyChangeSupport(this);
private static final long serialVersionUID = 1L;
// omitting other attributes
#Column(name = "genres")
private Serializable genres;
// omitting constructors and other getters/setters
public Serializable getGenres() {
return genres;
}
public void setGenres(Serializable genres) {
Serializable oldGenres = this.genres;
this.genres = genres;
changeSupport.firePropertyChange("genres", oldGenres, genres);
}
} // end class Games
Here are also some of the sites that might have the solution that I'm just not understanding:
https://forum.hibernate.org/viewtopic.php?t=946973
http://blog.xebia.com/2009/11/09/understanding-and-writing-hibernate-user-types/
// omitted hyperlink due to user restriction
Attempted Solutions
I'm able to get the data to display if I change the type of genres to String, but it is immutable and I cannot edit it. This is what I changed to do this:
#Column(name = "genres")
private String genres;
public String getGenres() {
return genres;
}
public void setGenres(String genres) {
String oldGenres = this.genres;
this.genres = genres;
changeSupport.firePropertyChange("genres", oldGenres, genres);
}
I also attempted to create a UserType file for use with Hibernate (JPA 1.0), but had no idea what was going wrong there.
I also attempted to use the #OneToMany and other tags, but these aren't working probably because I'm not using them properly.
What I'm Looking For
There has to be a simple way to get this data to display and make it editable, but since I'm completely new to persistence, I have no idea what to do.
The effort put into your question shows. Unfortunately JPA does not currently support PostgreSQL arrays. The fundamental problem is that arrays are not frequently used in many other databases frequently and so heavy reliance on them is somewhat PostgreSQL specific. Thus you can expect that general cross-db persistence API's are not generally going to support them well if at all. JPA is no exception, having currently no support for PostgreSQL arrays.
I have been looking at writing my own persistence API in Java that would support arrays, but it hasn't happened yet, would be PostgreSQL-only when written, and would be based on a very different principle than JPA and friends.

Grails iterating through database tables

As I'm a bit new to Grails, I'm wondering how I can iterate through the current data i have saved in the database to check if the information already exists.
For instance, lets say I have a domain class for Books and I create an action that automatically adds more books but I want to check if the book.title already exists so I don't add it again.
Side note
I'm just using the default database (whatever is used when the project is set to production mode)
Edit
I'll post my domains so it is a bit easier to understand. Instead of book.title, I changed it to where book belongs to author. So I only want the author added once but able to add many books for it. The issue happens in an action i created in the controller.
Author Domain:
class Author {
static hasMany = [books:Book]
String authorName
String notes
String age
String toString() { authorName }
static constraints = {
authorName()
notes(maxSize:500)
age()
}
}
Book Domain:
class Book {
static belongsTo = Author
String toString() { bookNumber }
Author bookAuthor
String title
String numberOfPages
static constraints = {
bookAuthor()
title()
numberOfPages()
}
}
Book Controller (this is where I'm having issues):
class BookController {
static allowedMethods = [save: "POST", update: "POST", delete: "POST"]
//took out index, create, list, etc. to focus on the once that I'm concerned about
//this action will simple read a text file and add books and authors
def gather = {
def parseData = new parseClient() //parses text file line by line and puts it into a list
def dataHolder = parseData.information //dataHolder will hold data from text file
int linesOfData = dataHolder.size() //get size to iterate and add authors & books
linesOfData.times {
def _author = dataHolder.author[it] //get author - Author
def _age = dataHolder.age[it] //get age - Author
def _title = dataHolder.title[it] //get title - Book
def _pages = dataHolder.pages[it] //get pages - Book
def authorInstance //create new Author to add
authorInstance = new Author() //for some reason I have to create and save AuthorName (can't have other fields) before I can add my Book correctly
authorInstance.setAuthorName(_author)
authorInstance.save()
def bookInstance
bookInstance = new Book() //create a new Book to add
bookInstance.setBookAuthor(authorInstance)
bookInstance.setTitle(_title)
bookInstance.setNumberOfPages(_pages)
bookInstance.save() //has to have access to the authorInstance to add correctly which is why i was wondering how to access the database to grab it if it existed
authorInstance.setAge(_age) //add whatever data is left for Author
authorInstance.save() //save again because cant save it with this information before I add authorInstance to the Book
}
}
}
Text File Content:
//You'll notice that author _Scott Davis_ is in here twice.
//I don't want to add two instances of Scott Davis but need to access it to add the book
//the unique constraint makes the value come up as not null but can't be added
Scott Davis : Groovy Recipes
Bashar Abdul Jawad : Groovy and Grails Recipes
Fergal Dearle : Groovy for Domain-Specific Languages
Scott Davis : GIS for Web Developers: Adding 'Where' to Your Web Applications
So I'm basically looking for a way to add that information and haven't found a way that seems to work without running into random problems.
Hope this clears my question up a bit, as the original question was a bit broad
In this case, you can put a unique constraint on title and grails will do that for you.
You can iterate thru the data, but you probably don't want to load the db if you can avoid it. So you could write a custom query to select the number of books for the title, for example.
Edit: for your updates
You don't need to use setters in your controller. Grails adds setters/getters for you dynamically at runtime. If you want to put some logic in your setters, then you can define your own and use them in that case
Have you looked at the grails documentation? http://grails.org/doc/latest/
you have a static constraints block, but you haven't defined how you want each property to be constrained. for unique title it would be
title(unique:true)
if you want to get a list of author names, you can do
List names = Author.executeQuery('select authorName from Author')

Resources