Date field not recognized with Azure Form Recognizer - azure-form-recognizer

I have built a custom neural model using the Form Recognizer Studio. I have marked the date fields when I labeled the data to build the model.
I have problems extracting the exact date value using the following Java SDK:
com.azure:azure-ai-formrecognizer:4.0.0-beta.5
The returned JSON (as previewed in the Form Recognizer Studio) is:
"Start Date": {
"type": "date",
"content": "01.05.2022",
"boundingRegions": [
{
"pageNumber": 1,
"polygon": [
1.6025,
4.0802,
2.148,
4.0802,
2.148,
4.1613,
1.6025,
4.1613
]
}
],
"confidence": 0.981,
"spans": [
{
"offset": 910,
"length": 10
}
]
}
If I am using the Java SDK, then the getValueDate() returns null, while the getContent() returns the correct string value.

Most likely the issue occurs because the document I use is not in English and the date format might not be recognized. As per documentation here: https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/language-support only English is supported for custom neural models.

Related

multiple 'instances' of the same labelled field on the same page in Azure Form Recognizer custom model?

I was wondering if there is something I'm missing on dealing with multiple instances of the same labelled field in a Azure Form Recognizer Custom Model (with labels)? Let's use the following (VERY simplified) document, for example:
Now, If I train a model to detect 'Name', 'DOB', and 'Company', I end up with results that look like:
{
"fields": {
"Name": {
"value_type": "string",
"label_data": null,
"value_data": {
"page_number": 1,
"text": "John R. Smith Ronald Johnson., Esquire",
"bounding_box": [
[
0.57,
4.435
],
[
1.8,
4.435
],
[
1.8,
6.005
],
[
0.57,
6.005
]
],
"field_elements": null
},
"name": "Name",
"value": "John R. Smith Ronald Johnson., Esquire",
"confidence": 1
},
...
As you can see, there is no delimiter between each 'instance' of the Name field in the Azure Form Recognizer results JSON. How should I train and/or deal with the Field results in a way that allows me to extract each instance of a given field from the document?
The first thing I tried, was marking the label name & the value for a field from the document and training on that. For example, Name: John R. Smith and Name: Ronald Johnson., Esquire would be what i marked in FOTT as the Name field for this training example. Then, I would split the result on Name:. This seems fine in theory, but in practice I ended up with VERY low accuracy compared to selecting JUST the field value and training on those.
Please label these as Name1 and Name2 to extract them as separate fields.

CakePHP 4 - custom request and response format

For a new project I want to use a CakePHP 4 as REST backend with a Vue.js frontend.
Now Cake uses a nested data structure while vue.js uses a flat data structure.
My plan now is to convert the data in the backend.
Example Format:
CakePHP
{
"user": {
"id": 1,
"name": "Peter Maus"
"articles" : [
{
"id": 15,
"title": "First Post",
}
]
},
}
Vue.js
{
"user": {
"id": 1,
"name": "Peter Maus"
"articles" : [ 15 ]
},
"articles": [
{
"id": 15,
"title": "First Post",
}
]
}
So basically instead of just sending json with
$this->viewBuilder()->setOption('serialize', ['user']);
I want to first "convert the datastructure" and then send as json.
I have now found the following possibilities for the conversion based on the documentation:
Request - convert from vue to cake
I have seen that you can use Body Parser Middleware with your own parser.
But I still have json as response format and I don't want to override the standard json formatter.
Response - convert from cake to vue
ideas:
I have seen "Data Views", but I'm not sure if it is suitable for this purpose.
extend the ViewBuilder and write my own serialize() function.
How would I have to include my own ViewBuilder, is that even possible?
write a parser function in a parent entity from which all my entities inherit. And call that parse function before serializing the data.
I will probably need access to the Entity Relations to dynamically restructure the data, for both: request and response.
What would be a reasonable approach?

Drupal Services Plugin ignoring multi-value fields

I'm using drupal 7 with services plugin 3.17
I'm trying to create a node with a field that accepts multiple values via json api with the following data:
{
"type":"custom_type_article",
"title":"My title",
"language":"und",
"body": {
"und": [ { "value": "Article body" } ]
},
"field_article_auhtors": {
"und": [{"value": "author 1"}, {"value": "author 2"}, {"value": "author 3"}]
}
}
The node is succesfully created but only the first value of field_article_auhtors is populated.
Is my json structure incorrect to create multiple values on "field_article_auhtors"?
Version 3.17 of Services has a bug with multi value fields. It looks like the bug is a regression introduced around version v3.6.
A patch was released in November, and multiple users are reporting it as working, though officially it's marked as 'Needs Work'. (The author has asked for a review of the code, and it has already been included in the dev version of Services. That said, a gentle nudge / reminder to test it in a dev environment. ;)
See the conversation, the patch, and a dev release of Services that includes it over on Drupal's official Services Project section at https://www.drupal.org/project/services/issues/2224803

How can you retrieve a full nested document in Solr?

In my instance of Solr 4.10.3 I would like to index JSONs with a nested structure.
Example:
{
"id": "myDoc",
"title": "myTitle"
"nestedDoc": {
"name": "test name"
"nestedAttribute": {
"attr1": "attr1Val"
}
}
}
I am able to store it correctly through the admin interface:
/solr/#/mySchema/documents
and I'm also able to search and retrieve the document.
The problem I'm facing is that when I get the response document from my Solr search, I cannot see the nested attributes. I only see:
{
"id": "myDoc",
"title": "myTitle"
}
Is there a way to include ALL the nested fields in the returned documents?
I tried with : "fl=[child parentFilter=title:myTitle]" but it's not working (ChildDocTransformerFactory from:https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents). Is that the right way to do it or is there any other way?
I'm using: Solr 4.10.3!!!!!!
To get returned all the nested structure, you indeed need to use ChildDocTransformerFactor. However, you first need to properly index your documents.
If you just passed your structure as it is, Solr will index them as separate documents and won't know that they're actually connected. If you want to be able to correctly query nested documents, you'll have to pre-process your data structure as described in this post or try using (modifying as needed) a pre-processing script. Unfortunately, including the latest Solr 6.0, there's no nice and smooth solution on indexing and returning nested document structures, so everything is done through "workarounds".
Particularly in your case, you'll need to transform your document structure into this:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name",
"_childDocuments_" :[
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}]
}
Then, the following ChildDocTransformerFactor query will return you all subdocuments (btw, although it says it's available since Solr 4.9, I've actually only seen it in Solr 5.3... so you need to test):
q=title:myTitle&fl=*,[child parentFilter=type:parentDoc limit=50]
Note, although it returns all nested documents, the returned document structure will be flattend (alas!), i.e., you'll get:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name"
},
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}
Probably, not really what you've expected but... this is the unfortunate Solr's behavior that will be fixed in a nearest future release.
You can put
q={!parent which=}
and in fl field :"fl=*,[child parentFilter=title:myTitle].
It will give you all parent field and children field of title:mytitle

Solr, adding a record via JSON with a multi-value field and boosted values

I'm pretty new to Solr, I'm trying to add a multi-value field with boost values defined for each value, all defined via JSON. In other words, I'd like this to work:
[{ "id": "ID1000",
"tag": [
{ "boost": 1, "value": "A test value" },
{ "boost": 2, "value": "A boosted value" } ]
}]
I know how to do that in XML (multiple <field name = 'tag' boost = '...'>), but the JSON code above doesn't work, the server says "Error parsing JSON field value. Unexpected OBJECT_START". Has Solr a limit/bug?
PS: I fixed the originally-missing ']' and that's not the problem.
EDIT: It seems the way to go should be payloads (http://wiki.apache.org/solr/Payloads), but I couldn't make them to work on Solr (followed this: http://sujitpal.blogspot.co.uk/2011/01/payloads-with-solr.html). Leaving the question open to see if someone can further help.
Found the following sentence in the from the Solr Relevancy FAQ - Query Elevation Component section
An Index-time boost on a value of a multiValued field applies to all values for that field.
I do not think adding an individual boost to each value in the multivalued field is going to work. I know that the Xml will allow it, but I would guess that it may only apply the boost value from the last value applied to the field.
So based on that I would change the Json to the following and see if that works.
[
{
"id": "ID1000",
"tag": {
"boost": 2,
"value": [ "A test value", "A boosted value"]
}
}
]
The JSON seems to be invalid missing a closing ]
[
{
"id": "ID1000",
"tag": [
{
"boost": 1,
"value": "A test value"
},
{
"boost": 2,
"value": "A boosted value"
}
]
}
]
You hit an edge case. You can have the boosts on single values and you can have an array of values. But not one inside another (from my reading of Solr 4.1 source code)
That might be something to create as an enhancement request.
If you are generating that JSON by hand, you can try:
"tag": { "boost": 1, "value": "A test value" },
"tag": { "boost": 2, "value": "A boosted value" }
I believe Sols will merge the values then. But if you are generating it via a framework, it will most likely disallow or override multiple object property names (tag here).
The error has nothing to do with boosting.
I get the same error with a very simple json doc.
No luck solving it.
see Solr errors when trying to parse a collection: Error parsing JSON field value. Unexp ected OBJECT_START
I hit the same error message. Actually the error message was misplaced. The underlying real error was the two of the required fields as per schema.xml in solr configuration were missing in the json payload.
An error message of the kind "required parameters are missing in the document" would have been more helpful here. You might want to check if some required fields are missing in the json payload.

Resources