I am working on an Alexa Skill and am having trouble for Alexa to understand my voice input. Therefore the utterances are not properly matched with the required slots... and alexa is always re asking or getting stuck.
Here are some examples:
affirm: f.m., a from
Speedbird: Speedboard, speaker, speed but, speed bird, spirit, speedbath
wind: windies (wind is), when is home (wind is calm)
runway 03: runway sarah three
takeoff: the cough
Any solution to training Alexa to properly understand me? Or should I just write as utterance all these "false" utterances so alexa will match my intents properly?
Thanks for any help!
There is no chance to train the language understanding itself of Alexa.
Yes, as you wrote: I would just take these false utterances as matches for your intent.
This seems also what is recommended by amazon:
...might show you that sometimes Alexa misunderstands the word "mocha" as
"milk." To mitigate this issue, you can map an utterance directly to
an Alexa intent to help improve Alexa's understanding within your
skill. ....
two common ways to improve ASR accuracy are to map an intent value or
a slot value to a failing utterance
Maybe give an other person a try to see if it's recognized the same way as your speech.
Word-Only Slots
If you're still struggling with this, you should try adding more variations to your slot values (synonyms are an option if you have specific interpretations that keep repeating). Consider adding synonyms like speed bird for Speedbird (and take off for takeoff). Non-standard word slots will not resolve as accurately as common words. By breaking Speedbird into two words, Alexa should more successfully recognize the slot. Information about synonyms are here:
https://developer.amazon.com/en-US/docs/alexa/custom-skills/define-synonyms-and-ids-for-slot-type-values-entity-resolution.html#sample-slot-type-definition-and-intentrequest
Once you've done this, you'll want to grab the canonical value of the slot, not the interpreted value (e.g. you want Speedbird not speedboard).
To see an example, scroll to the very last JSON code block. The scenario described in this request is that the user said the word track with is a synonym for the slot value song in their request. You'll see the MediaType value is track (what the user said) but if you take a look at the resolutions object, inside the values array, the first value object is the actual slot value song (what you want) associated with the synonym.
This StackOverflow goes a little more into the details on how you get that value:
How do I get the canonical slot value out of an Alexa request
Word and Number Slots
In the case of the "runway 03" example, consider breaking this into two different slots, e.g. {RunwaySlot : Custom} {Number : Amazon.Number}. You'll have better luck with these more complex slots. The same is true for an example like "red airplane," you'll want to break it into two slots: {Color : Amazon.Color} {VehicleSlot : Custom}
.
https://developer.amazon.com/en-US/docs/alexa/custom-skills/slot-type-reference.html#number
Related
I have a skill that elicits a U.S. state and county from the user and then retrieves some data. The backend is working fine, but I am concerned about how to structure the conversation. So far, I have created an intent called GetInfoIntent, which has two custom slots, state_name, and county_name
There are about 3,000 U.S. counties with many duplicate names. It seems silly to me that I am asking for a county, without first "narrowing down", by states. Another way I can think of to do the conversation is to have 50 intents, "GetNewHampshireInfo, GetCaliforniaInfo, etc. If I did it this way, I'd need a custom slot type for each state, like nh_counties, ca_counties. etc.
This must be a pretty generic problem. Is there a standard approach, or best practice, I can use?
My (not necessarily best practice) practice tips:
Single slot for single data type. Meaning only have one slot for a four digit number even if you use it in more than one place for two different things in the skill.
As few intents as you need
no more no less. You certainly can and should break up the back end code with helper code, but try and not break the intents into too many smaller pieces. It can lead to difficulty when Alexa is trying to choose the intended intent.
Keep it voice focused. How would you ask in a
conversation. Voice first development is always the way to go.
For the slot filling I think it is fine to ask both state and county.
If the matching is not correct ask for confirmation.
Another option is to not use auto filling within the Alexa skill and use the dialog interface. Ask the county first and then only when it has more than one state option and is ambiguous continue the dialog to fill the state.
Even if you did have 50 separate intents you really never want to have two slots that can be filled by the same word. For example having a mo_counties and ky_counties that Clack satisfies both is ambiguous and can cause unneeded difficultly.
So for someone looking for the "best practice" I have learning that there isn't one yet (maybe never will be). Do what makes sense for the conversation and try and keep it as simple as it needs to be and no less on the back end.
I also find it helpful to find a non-developer to test your conversation flow.
This wasn't really technical and is all opinion, but that is a lot of what Alexa development is. I would suggest Tuesday Alexa office hours at https://www.twitch.tv/amazonalexa very helpful and you can ask questions about stuff like this.
Whenever I go inside the skill and say one completely random word, the Fallback Intent is not triggered. The echo will just emit a sound and in the Alexa simulator, it would just show nothing. But I know for a fact that I am still inside the skill and the session has not yet ended since if I try to say an utterance that is mapped to a certain intent without including the word Alexa, it would respond correctly. BUT, if I try to say TWO completely random words the Fallback Intent is triggered. For example(this is already inside the skill), if I say the word "pizza" it would just respond with that weird noise and stay in the current session. But if I say the words "pizza pie" it would map to the Fallback Intent.
I have observed this behavior in a skill that has many custom intents each having many utterances configured. But when I tried inputting the word "pizza" to a skill with only 3 custom intents, the Fallback intent works fine.
If, when you say the out-of-domain word, you get a reprompt and then and end of session it means that Alexa assigned a very low confidence level to the mapping of that utterance to an intent. And this also applies the fallback intent!
Every time you build your model and out-of-domain model for fallback is built in parallel. That model is supposed to catch out-of-domain utterances but it's not perfect. Only utterances with a high confidence of matching the fallback model will be routed to the fallback intent. This is by design (fot the current version) meaning that not all utterances (both low and high confidence) will trigger fallback when fallback is the candidate. So what you're seeing here is an utterance that generates a low confidence for fallback (fallback is the best chosen candidate but confidence is too low). As fallback gets better it will become more effective at capturing these cases. A rather awkward solution (which defeats the purpose of fallback I guess) will be to extend fallback with sample one word utterances similar to the ones you're trying. Hope this helps...
Update: FallbackIntent sensitivity tuning was added recently so now, if you set it to high in the voice interactions model, it will work as you expected!
I think I just found the solution...
Having single word on any intent, Alexa by mistake (obviously), tends to match a single word interaction with some intent with a single word as sample utterances!!!!!!
As I see, Alexa uses the terms and the term count to calculate the statistical match with user interaction... wow!
Hope it helps you guys!
I am creating a skill but I need a slot type for my intent (which takes a complete sentence as input) but it should be in Indian.
Like: AMAZON.LITERAL
It only supports English(U.K) and English(US).
I need any slot type which takes a complete sentence as input but supports English(Indian). Thanks.
I'm also seraching for this one for last 2 weeks. And also i email to support team of alexa. After that i got confirmation from support team that AMAZON.LITERAL slot type is not support feature in english(india).
You can only fake it:
create your own slot type
for this slot type give a lot of different utterances in different length (like "one", "this is a sentence.." and so on)
use your slot with custom slot type in your utterances
I also needed it for a simple echo skill in GERMAN which just reply.
See the dialogue model here:
https://gitlab.com/timguy/alexa-wiederhall/blob/master/src/main/java/github/timguy/wiederhall/speechAssets/IntentSchemaSlotsUtterances_SkillBuilder.json
instead of samples "{slotOne}" you could use " save the text I say {slotOne}"
UPDATE
I didn't tried it but just read it now. Now you could use AMAZON.SearchQuery
https://developer.amazon.com/de/blogs/alexa/post/a2716002-0f50-4587-b038-31ce631c0c07/enhance-speech-recognition-of-your-alexa-skills-with-phrase-slots-and-amazon-searchquery
Amazon.Literal is deprecated and if you are changing the skill to en-Us then it won't be available in any other locale which i think you never want.
Moreover as per my interaction with amazon development team it is not a good option to have a free text in terms of Amazon.Literal as it would add more ambiguity in their NLP for resolving intent and slot than addressing the underlying issue. Since Alexa doesn't provide you any confidence factor in intent/slot then it would be a big problem to your skill as any random word/sentence would match to Amazon.Literal.
It's always good to restrict your user input as you are developing the skill especially when it involves AI/NLP.
Update
You can use the new slot type Amazon.SearchQuery that would suit your problem
I am developing an Alexa skill, where I have a stop for names of fruits. However, if I speak something like "What is apple's cost" where the slot value has an apostrophe, Alexa does not seem to recognize the apostrophe. Workaround is to say something like "What is the cost of an apple" but that would not be the best customer experience.
How can I make Alexa understand slot value with apostrophes? Any help is appreciated.
I think this is what you are looking for.
Create Intents, Utterances, and Slots (Rules for Sample Utterances)
If the word for a slot value may have apostrophes indicating the
possessive, or any other similar punctuation (such as periods or
hyphens) include those within the brackets defining the slot. Do not
add 's after the closing bracket. For example: ...
My friend, the apostrophe could be parsed depending on the voice recognition system internally, but it will never understand in real time an apostrophe.
Good news though, you dont need the apostrophe, think about it, it is only recognizing what the custommer would say without capital letters and special characters. Meaning, if the custommer says "What is apple's cost", alexa would recognize as the following "what is apples cost". This is a problem that should be worked server-side, cause you only need to understand what the custommer meant. You should implement server side a string matching function using levenshtein's algorithm.
Alexa just doesn't understand the word 'postpaid' and I've tried it a million times in my skill. I also tried "Alexa, Simon says postpaid" but it repeats something else other than postpaid, I don't know why. My sample utterance is like this "what is the {type} sales" and the type has custom slot values "postpaid",etc.
I've looked at AMAZON.LITERAL but didn't quite understand it if it will help me in my case. So any workaround will be helpful and thanks in advance.
What does Alexa think you said? Maybe you can use that in your intent also. Your code can check for and replace whatever that is to "postpaid".
This is a bit of a hack, but may work for you until Amazon provides us with a way to fine tune input.
Alexa will not always restrict the transcription the options in a slot to the given values, specially if you have a large list of possible values. Either using a list or AMAZON.LITERAL, in this case, your best bet may be to check wether the identified value is in fact one of the values in your list and use it, otherwise, you can use a phonetic matching/similarity algorithm to select the closest value.
Hit me up if you need example code (in Python in my case)
This feels simplistic but have you tried breaking postpaid into two words?
{type} == "post paid"
Slots can contain multi word utterances. Perhaps Alexa will recognize the two distinct morphemes.