Multi-language support for cook trait - google-smart-home

we are trying to add multi-language support to our google home application. We need to support: English(us), En(CA), French(CA).
For that we specify lang property in the sync response including locale like en-us, en-ca, fr-ca:
Note: synonyms for EN-US and EN-CA may be different.
"requestId":,
"payload":{
"agentUserId":"xxx",
"devices":[
{
"id":"xxx",
"type":"xxx",
"traits":[
"action.devices.traits.Cook",
"action.devices.traits.OnOff"
],
"name":{
"defaultNames":[
"xxx"
],
"name":"xxx",
"nicknames":[
"xxxx"
]
},
"willReportState":true,
"attributes":{
"supportedCookingModes":[
"BREW",
"BOIL"
],
"foodPresets":[
...,
{
"food_preset_name":"water",
"supported_units":[
"CUPS",
"OUNCES",
"NO_UNITS"
],
"food_synonyms":[
{
"synonym":[
"hot water",
"water",
],
"lang":"en-us"
},
{
"synonym":[
"hot water",
"water",
],
"lang":"en-ca"
},
{
"synonym":[
"eau chaude",
"eau",
],
"lang":"fr-ca"
}
]
}
]
},
"deviceInfo":{
"manufacturer":"xxx",
"model":"xxx",
"hwVersion":"1.0",
"swVersion":"1.0"
}
}
]
}
}
With such settings 50% of phrases do not work.
It worked with lang: "en" when we had English language only. But after adding of additional languages it stopped working.
As well here is selected languages:
What we are missing?

The kind of notation that you seem to be using for the lang as part of the sync response is BCP-47 notation. These tags are typically of the form language-region, where language refers to the primary language and region refers to the region (usually a country identifier) of the particular dialect. For example, English may either be represented by American English (en-US), or British English (en-GB). A complete comprehensive list of the languages supported by Google Assistant Service can be found here :
https://developers.google.com/assistant/sdk/reference/rpc/languages?hl=en
Now when it comes to Smart Home, instead of providing a language-region pair you should just provide a language. Assistant for Smart Home will handle all different regions for that primary language. These changes can be made to the sync response for the Cook trait to get it fully working:
Change the “en - us” and “en - ca” as part of the lang in the food_synonym attribute to just “en”.
Change the “fr - ca” as part of the lang in the food_synonym attribute to just “fr”.
List all the synonyms for different regions under these languages.
More information about the list of languages currently supported Smart Home (Cook and OnOff traits) can be found here :
https://developers.google.com/assistant/smarthome/traits#supported-languages

Related

Date field not recognized with Azure Form Recognizer

I have built a custom neural model using the Form Recognizer Studio. I have marked the date fields when I labeled the data to build the model.
I have problems extracting the exact date value using the following Java SDK:
com.azure:azure-ai-formrecognizer:4.0.0-beta.5
The returned JSON (as previewed in the Form Recognizer Studio) is:
"Start Date": {
"type": "date",
"content": "01.05.2022",
"boundingRegions": [
{
"pageNumber": 1,
"polygon": [
1.6025,
4.0802,
2.148,
4.0802,
2.148,
4.1613,
1.6025,
4.1613
]
}
],
"confidence": 0.981,
"spans": [
{
"offset": 910,
"length": 10
}
]
}
If I am using the Java SDK, then the getValueDate() returns null, while the getContent() returns the correct string value.
Most likely the issue occurs because the document I use is not in English and the date format might not be recognized. As per documentation here: https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/language-support only English is supported for custom neural models.

Does Google Assistant handle any translations for synonyms/values given in a specific language?

I am curious as to how Google Assistant handles synonyms/values given in a specific language within a trait. E.g.:
{
"availableFanSpeeds": {
"speeds": [{
"speed_name": "S1",
"speed_values": [{
"speed_synonym": ["low", "speed 1", ... ],
"lang": "en" } , … ]
},
{
"speed_name": "S2",
"speed_values": [{
"speed_synonym": ["high", "speed 2", ... ],
"lang": "en" } , … ]
}, ...
],
"ordered": true
},
"supportsFanSpeedPercent": true,
"reversible": true
}
Does it try to map any translations automatically?
For example I noticed for FanSpeed that 'low' and 'high' is understood in a different language even though specified as part of the FanSpeed trait only in English. Yet with the ArmDisarm trait a mode called 'away' is not understood in a different language when only given in English.
This seems inconsistent - is there any pattern to understand when Google attempts to automatically translate a mode/synonym/value versus not?
Google Assistant handles translations at the trait level for device types and traits for Smart Home ecosystem. Natural Language Understanding (NLU) is a very complex subject, and Google is always looking to improve their systems. If you can create a request in the bug tracker we can make sure these cases are covered issuetracker.google.com/issues/…
Depending on the language, it can also be possible that the command being given for the ArmDisarm was in a language that is not supported by the Smart Home's device traits.You can check the list of languages supported by the device traits at https://developers.google.com/assistant/smarthome/traits#supported-languages

How do I restrict medication annotations to a specific document section via IBM Watson Annotator for Clinical Data (ACD)

I’m using the IBM Watson Annotator for Clinical Data (ACD) API hosted in IBM Cloud to detect medication mentions within discharge summary clinic notes. I’m using the out-of-the-box medication annotator provided with ACD.
I’m able to detect and extract medication mentions, but I ONLY want medications mentioned within “DISCHARGE MEDICATIONS” or “DISCHARGE INSTRUCTIONS” sections.
Is there a way I can restrict ACD to only return medication mentions that appear within those two sections? I’m only interested in discharge medications.
For example, given the following contrived (non-PHI) text:
“Patient was previously prescribed cisplatin.DISCHARGE MEDICATIONS: 1. Aspirin 81 mg orally once daily.”
I get two medication mentions: one over “cisplatin” and another over “aspirin” - I only want the latter, since it appears within the “DISCHARGE MEDICATIONS” section.
Since the ACD medication annotator captures the section headings as part of the mention annotations that appear within a section, you can define an inclusive filter that checks for (1) the desired normalized section heading as well as (2) a filter that checks for the existence of the section heading fields in general, should a mention appear outside of any section and not include section header fields as part of the annotation. This will filter out any medication mentions from the ACD response that don't appear within a "DISCHARGE MEDICATIONS" section. I added a couple other related normalized section headings so you can see how that's done. Feel free to modify the sample below to meet your needs.
Here's a sample flow you can persist via POST /flows and then reference on the analyze call as POST /analyze/{flow_id} - e.g. POST /analyze/discharge_med_flow
{
"id": "discharge_med_flow",
"name": "Disharge Medications Flow",
"description": "Detect medication mentions within DISCHARGE MEDICATIONS sections",
"annotatorFlows": [
{
"flow": {
"elements": [
{
"annotator": {
"name": "medication",
"configurations": [
{
"filter": {
"target": "unstructured.data.MedicationInd",
"condition": {
"type": "all",
"conditions": [
{
"type": "all",
"conditions": [
{
"type": "match",
"field": "sectionNormalizedName",
"values": [
"Discharge medication",
"Discharge instructions",
"Medications on discharge"
],
"not": false,
"caseInsensitive": true,
"operator": "equals"
},
{
"type": "match",
"field": "sectionNormalizedName",
"operator": "fieldExists"
}
]
}
]
}
}
}
]
}
}
],
"async": false
}
}
]
}
See the IBM Watson Annotator for Clinical Data filtering docs for additional details.
Thanks

Differences between Suggesters and NGram

I've built an index with a Custom Analyzer
"analyzers": [
{
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "ingram",
"tokenizer": "whitespace",
"tokenFilters": [ "lowercase", "NGramTokenFilter" ],
"charFilters": []
}
],
"tokenFilters": [
{
"#odata.type": "#Microsoft.Azure.Search.NGramTokenFilterV2",
"name": "NGramTokenFilter",
"minGram": 3,
"maxGram": 8
}
],
I came upon Suggesters and was wondering what the pros/cons were between these 2 approaches.
Basically, I'm doing an JavaScript autocomplete text box. I need to do partial text search inside of the search text (i.e. search=ell would match on "Hello World".
Azure Search offers two features to enable this depending on the experience you want to give to your users:
- Suggestions: https://learn.microsoft.com/en-us/rest/api/searchservice/suggestions
- Autocomplete: https://learn.microsoft.com/en-us/rest/api/searchservice/autocomplete
Suggestions will return a list of matching documents even with incomplete query terms, and you are right that it can be reproduced with a custom analyzer that uses ngrams. It's just a simpler way to accomplish that (since we took care of setting up the analyzer for you).
Autocomplete is very similar, but instead of returning matching documents, it will simply return a list of completed "terms" that match the incomplete term in your query. This will make sure terms are not duplicated in the autocomplete list (which can happen when using the suggestions API, since as I mentioned above, suggestions return matching documents, rather than a list of terms).

Watson Conversation: Show user all entities

In Watson conversation. I have an entity Fruit
Fruit values:
-Apple
-Orange
-Banana
-Kiwi
I'd like to create a new dialog where the intent of the user is to get a list of all the values of a specific entity, in this case a list of all the fruits, . So the conversation should go:
User: "What fruits do you have?"
And then I'd like Watson to respond
Watson: "The fruits we got in store are: Apple, Orange, Banana, Kiwi"
All the stuff I found is of recognizing an entity in users input, such as
User: "Do you have apples?"
And Watson picking up Apples
Just to clarify, setting an array with the possible options declarativly on a context variable as shown below is no good for me, I need to get them dynamically from the entity
{
"context": {
"fruits": [
"lemon",
"orange",
"apple"
]
},
"output": {
"text": {
"values": [
"This is the array: <? $fruits.join(', ') ?>"
],
"selection_policy": "sequential"
}
}
}
Thankss!
AFAIK it is not possible to directly access the workspace metadata from within a dialog. You have access to what was detected using the intents, entities and context variables. However, I see two options you have:
Use your application program that drives the chat to access the entity definitions on the fly, then create a context variable in which you offer the entity choices. The API to list entities can be used from any programming language and there are SDKs.
With a relatively new feature you can invoke server or client actions from within a dialog node, i.e., make programmatic calls. Use that and the API mentioned above to obtain the list of entity values.
.

Resources