Supervised Learning with Form Recognizer

Supervised Learning with Form Recognizer - azure-form-recognizer

I am trying to analyze a form using Microsoft's Form Recognizer API but I am not seeing the results I had hoped for. After training the model on my form, the keys it has generated are very rarely what I want to be. Does anyone know if there is a method to improve accuracy of key recognition? I was thinking there might be some way to give a list of key/value pairs when training as a form of supervised learning.
Here is a sample of the form I'm trying to parse.
I'd expect keys of 'Year', 'Make', 'Model', and 'VIN'. But instead the model is returning a key of 'Vehicle' with values 'Year', 'Make', 'Model', and 'VIN' and their subsequent values.
I know I specifically asked about supervised learning but really any techniques or tips on how to improve the accuracy of a form recognizer model would be appreciated.

Azure Form Recognizer now offers a Supervised Learning Tool to tune models for forms which are difficult to train with the default unsupervised learning mode.
Here's how you can get to the tool:
https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/label-tool

Did you train a model with 5 sample forms ? Can you try adding an empty form without the values to the training data and see if it helps ? Are the forms good quality scanned or are they tilted ?
Following are some tips on how you can improve the accuracy:
How to Build a training data set for a custom model
When you use the Form Recognizer custom model, you provide your own training data so the model can train to your industry-specific forms. You can train a model with five filled-in forms or an empty form (include the word "empty" in the file name) plus two filled-in forms. Even if you have enough filled-in forms to train with, adding an empty form to your training data set can improve the accuracy of the model.
It's important to use a data set that's optimized for training. Use the following tips to ensure you get the best results from the Train Model operation:
• If possible, use text-based PDF documents instead of image-based documents. Scanned PDFs are handled as images.
• Use one empty form and two filled-in forms if you have them available.
• For filled-in forms, use examples that have all of their fields filled in.
• Use forms with different values in each field.
• If your form images are of lower quality, use a larger data set (10-15 images, for example).

Related

WordPress : make categories automatically match according to external API Value

I'm managing a company website, where we have to display our products. We however do not want to handle the admin edit for this CPT, nor offer the ability to access to the form. But we have to read some product data form the admin edit page. All has to be created or updated via our CRM platform automatically.
For this matter, I already setup a CPT (wprc_pr) and registered 6 custom hierarchical terms: 1 generic for the types (wprc_pr_type) and 5 targeting each types available: wprc_pr_rb, wprc_pr_sp, wprc_pr_pe, wprc_pr_ce and wprc_pr_pr. All those taxonomies are required for filtering purposes (was the old way of working, maybe not the best, opened to suggestions here). We happen to come out with archive pages links looking like site.tld/generic/specific-parent/specific-child/ which is what is desired here.
I have a internal tool, nodeJS based, to batch create products from our CRM. The job is simple: get all products not yet pushed to the website, format a new post, push it to the WP REST API, wait for response, updated CRM data in consequence, and proceed to next product. Handle about 1600 products today on trialn each gone fine
The issue for now is that in order for me to put the correct terms to the new post, I have to compute for each product the generic type and specific type children.
I handled that by creating 6 files, one for each taxonomy. Each file is basically a giant JS object with the id from the CRM as a key, and the term id as a value. My script handles the category assertion like that:
wp_taxonomy = [jsTaxonomyMapper[crm_id1][crm_id2]] // or [] if not found
I have to say it is working pretty well, and that I could stop here. But I will have to take that computing to the wp_after_insert_post hook, in order to reaffect the post to the desired category on updated if something changed on the CRM.
Not quite difficult, but if I happen to add category on the CRM, I'll have to manually edit my mappers to add the new terms, and believe me that's a hassle.
Not waiting for a full solution here, but a way to work the thing. Maybe a way to computed those mappers and store their values in the options table maybe, or have a mapper class, I don't know at all.
Additional information:
Data from the CRM comes as integers (ids corresponding to a label) and the mappers today consist of 6 arrays (nested or not), about 600 total entries.
If you have something for me, or even suggestions to simplify the process, I'll go with it.
Thanks.
EDIT :
Went with another approach, see comment below.

Search/Filter Dropdown in Django Admin Panel for standard CharField

I am using Django to build data models including a model Company. Some of the fields belonging to this model are limited to set choices using the choices='' argument. Some of these fields have a large number of choices, for example a country CharField which lists all countries. Finding the right value among the long list can be tedious so I want to be able to search across the given choice values. This is easy to do for ForeignKey/ManytoMany fields using autocomplete_fields = [] as seen in the attached screenshot (from a different model) but can't seem to find a method for implementing this with a normal CharField with lots of choices. This seems like something that should be built into the Django for the admin panel but I can't find anything within the docs. Please how can I implement a search/filter dropdown for any given (non FK/m2m) fields? Thanks in advance. If there's anymore information I can provide let me know and I will.
country = models.CharField(max_length=128, choices=COUNTRY_CHOICES, null=True)
Model code added. This COUNTRY_CHOICES array is what I would like to be able to select from in a searchable dropdown list.

Firestore Data Modelling ( Users and Problems)

I am working on a web app for doing certain problems. The database is firestore.
The feature I want to ask is that on a page, there will show {x}/{y},
where x is the number of problems the user completed, and y is the total number of problems, something like 6/10.
Sample image
My doubt is how to design data model to achieve this feature. I already have users collection and problems collection. My thought is choosing one of them, but feeling none of them is good practice:
A. Adding user_response field in each problem doc, which means if there are 1000 users, each problem needs to add 1000 users' responses. It is also hard for new users coming.
B. Quite like A but adding problems_response into user doc.
C. Adding a new collection repsonses which has problem_id, user_id fields.
I am new to NoSQL, thanks for your help.
I added the database structure in the following images:
database-1
database-2
database-3
database-4
database-5

How can I make a photograph database that displays the results of a query as some kind of image gallery?

I am taking a database design class and for a project want to make a database of my mom's digital photos for her. I haven't dealt in application up to this point, only theory, but I have Access. Therefore, ideal answers don't suggest non-database solutions and don't assume I know much about actual database implementation. Solutions specific to Access could also be a plus. I hope that precursor saves some time and effort.
Theoretically, my mom wants to see all photos of pets from '05-'07 in raw format, and she enters an appropriate query. I suspect I can handle it up to there. However, at the moment, the best I can figure out to do is to return a column of either attachments or OLE objects. 5 clicks per photo is not ideal. I need a faster way to present the images. Opening them all in a grid of thumbnails or as a one-click-slide-show would seem the natural fit, but whatever works. How can I accomplish this?
Less important but worth consideration is the fact that, at some point, it would be great if this same type of system could be implemented on the internet for all of the family reunion photos she has taken, but I will take what I can get.

Use one form to get parameters for the query. then use another form(more processing) or report(if printing) to show the selected pictures. I will not cover passing parameters but here are some links.
https://www.fmsinc.com/microsoftaccess/forms/openargs/index.htm
https://learn.microsoft.com/en-us/office/vba/access/concepts/forms-design/apply-a-filter-when-opening-a-form-or-report
There is a complication, in Access pictures are usually stored in the attachment type. the attachment column can hold many pictures in each record. So if we have a table called Pictures with an attachment type column also called Pictures, then each individual picture is actually stored under Pictures.Pictures.FileData.
So to display the picture query we use a form/report with default view set to Continuous Forms (displays many records or in this case pictures on the same page) then in the details section of our Display form we place an attachment control and bind that control to our filtered Pictures.FileData.
Format and add functionality to taste.

how to query watson discovery api?

I am experimenting IBM watsons' Discovery API to get data insights. I want to query using multiple filters. I am using python to accomplish the task. I have tried this for now, but this is not working.
qopts = {'filter':[{'enriched_text.entities.text:Recurrent Neural
Networks,Machine Learning classifiers'}]}
my_query = discovery.query(env_id, coll_id, qopts)
with only single entity : 'recurrent Neural Networks' through the discovery UI and through my python query, I get 3 documents from the collection.
but with two entities, 'Recurrent Neural Networks,Machine Learning classifiers', in the UI I get 2 documents but through my code, I get 2 documents.

Below is then right format which works for me. with multiple concept and keyword filters, I get a total of 2 search results, which match with the UI query
qopts = {'filter':{'enriched_text.concepts.text:"Neural network",enriched_text.keywords.text:"Neural Network",enriched_text.keywords.text:"generative conversational models"'}}
with only entity I get 3 match results
qopts = {'filter':{'enriched_text.concepts.text:"Neural network"'}}
in this example I am querying the documents with concept 'Neural network' , keywords 'Neural Network' and 'generative conversational models'

Inside Watson discovery documentation, inside the UI you'll use (according to the documentation):
But obviously, without the ! operator inside the second text.
and I think inside your code you need to use , between the values.
Not sure because I don't use the enriched_text.entities.textinside my filter, just the Strings.
One possible reference for another example to test:
filter=field1:some value,field2:another value
Official reference documentation: here.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight