An issue I am seeing is that when I ask in dialogflow for the user to spell out their user id like joesmith2014, there are a large number of errors. The follow post suggested that I can fix this by using speech context to tell the speech to text engine that the user will be spelling out alphanumerics.
https://stackoverflow.com/questions/62048288/dialogflow-regex-alphanumeric-speech
I can't figure out how you would do this while using the actions-on-google library or can this not be down in the fulfillment webhook?
Thanks.
As an example, I created an agent called “alphanumeric” because it will accept any alphanumeric value I send following the next steps:
Check the box regexp entity
Add a single entry, ^[a-zA-Z0-9]{3}[a-zA-Z0-9]*$
Then save it
Your agent should look something like this:
Please note that the regexp entity I added is strict in that it is looking only for a string of alphanumerics, without any spaces or dashes. This is important for two reasons:
This regexp follows the auto speech adaptation requirements for enabling the "spelled-out sequence" recognizer mode.
By not looking for spaces and only looking for entire phrases (^...$), you allow end-users to easily exit the sequence recognition. For example, when you prompt "what's your order number" and an end-user replies "no I want to place an order", the regexp will reject and Dialogflow will know to look for another intent that might match that phrase.
If you are only interested in numeric values, you can create a more tailored entity like [0-9]{3}[0-9]*, or even just use the built-in #sys.number-sequence entity.
Dialogflow ES fulfillment cannot affect speech recognition quality because speech-to-text processing happens before the Dilaogflow request is sent to Dialogflow fulfillment. Check the diagram in the Dialogflow ES Basics documentation.
You can improve speech recognition quality either by enabling auto speech adaptation in the agent settings or by sending speech contexts in the Dialogflow API requests. Note that speech contexts sent via API override implicit speech context hints generated by auto speech adaptation.
If you use regexp entities, make sure that your agent design meets all the requirements listed in this speech adaptation with regexp entities document. See an example of how an intent that collects the employee ID and satisfies these requirements may look like:
When testing the agent, make sure that you test it consistently via voice, including the inputs preceding the utterance expected to match a regexp entity.
This tutorial for iterative confirmation of spoken sequences may also help with the agent design.
Related
I have written a smart speaker app for Google Home using DialogFlow, and am now in the process of porting it over to Alexa.
One of the fundamental differences seems to be the inability to easily trigger follow-up intents. For example, I have a dialog that asks the user a series of questions, one after the other, before providing a result based on the answers provided. e.g. ({slot types})
Do you like a low maintenance or working garden? {low maintenance}{working}
Do you like a garden you can relax in? {yes/no}
Would you like to grow vegetables in your garden? {yes/no}
This is easy to achieve using DialogFlow follow-up intents, but I have no clue where to start with Alexa and there dont seem to be many examples out there. All I can find seems to focus on slot filling for a single dialog.
I am using my own API service to serve results (vs Lambda).
Can anybody recommend a way of achieving this in an Alexa Skill?
I managed to achieve this by adding a single utterance with three individual slots, one for each of the answers required:-
inspire me {InspireMaintenance} {InspireRelax} {InspireVeg}
These Slots back onto one SlotType - Custom_YesNo, which had Yes and No values + synonyms. My C# service then checks for each of these required slots and where one is missing it triggers the relevant question as a response. Once all slots are filled it provides the answer.
Not as intuitive as Dialogflow and requires code to achieve what can be done without code in DF, but at least it works :)
Is it possible to restrict and AVS device (a device running Alexa) to a single skill? So if I built an AI skill and have it running on a device, is it possible to keep the experience inside the custom skill so I don't have to keep saying Alexa, open
One trick you can do with AVS is to prepend every single request with a sound clip equivalent to: "ask to ..." It's definitely a hack, but I was able to use it with some success.
See my write-up here: https://www.linkedin.com/pulse/adding-context-alexa-will-blaschko-ma
The relevant parts (in case the link goes away).
Regular voice commands don't carry my extra information about the
user, but I wanted to find a way to tack on metadata to the voice
commands, and so I did just that--glued it right onto the end of the
command and updated my intents to know what the new structure would
be.
...
In addition to facial recognition, voice recognition could help
identify users, but let's not stop there. Any amount of context can be
added to a request based on available local data.
“Find frozen yogurt nearby" could silently become “Alexa open Yelp and
find frozen yogurt near 1st and Pine, Seattle” using some built in
geolocation in the device (phone, in this case).
I also use something similar in my open source Android Alexa library to send prerecorded commands: https://github.com/willblaschko/AlexaAndroid
I think you are looking for AWS Lex which allows you to write Alexa like skills without the rest of Alexa feature set.
http://docs.aws.amazon.com/lex/latest/dg/what-is.html
Is it possible to launch an Alexa App with just its name? This is similar to when you ask it what the weather is.
"Alexa, weather"
However I would like to be able to say
"Alex, weather in Chicago" and have it return that value
I can't seem to get the app to launch without a connecting word. Things like ask, open, tell would count as a connecting word.
I have searched the documentation but can't find mention of it, however there are apps in the app store that do this.
It is documented in the first item here.
I've verified that this works with my own skill. One thing I've noticed is that Alexa's speech recognition is much worse when invoked in this manner presumably because it requires matching against a greater set of possible words. I have to really enunciate in a quiet room to get Alexa to recognize my invocation name in this context.
When developing a custom skill you have to use the connecting words e.g. Alexa, ask your invocation name to do something.
If you want to pass a variable, you have to specify the sample utterances:
OneshotTideIntent get high tide
OneshotTideIntent get high tide for {City} {State}
Then you handle cases in your code when user does not provide these values. For examples see https://github.com/amzn/alexa-skills-kit-js
When writing the example phrases you use the following construct:
"Alexa, [connecting word] [your invocation name], [sample utterance]". As far as I have noticed she is rather picky and you have to be exact when invoking custom skill (the voice recognition works way better with built in skills)
EDIT: launching skill without connecting word is possible when developing "smart home" skill
I want to train nlc in such way that -
If I give an input as - "Sharpies" or "Cakes" or "iPhone6" then it should result in order as intent.
But it's not working for all the products, as intent should come for all the product names, where I would need to train NLC with few of product name and it will work for all the products (dynamically).
As we have thousands of products, how can get the intent as "order" for all products instead of adding all in ".csv" (Don't want to hard code all the product names)?
Can you please help me with this to retrieve the exact intent for all dyanmical products name as input to NLC?
What you are trying to do is not what NLC is intended for.
The purpose of intent is to understand what it is the end user is trying to achieve, not what products/keywords may appear in a sentence.
For example "I want to buy an iPhone" vs "I want to unlock my iPhone". Both mention iPhone but have two very different intents. In this case with training, you can distinguish between wanting to purchase, vs wanting to unlock.
One option you can try is looking at the Alchemy API entity extraction.
Another option is to use Watson Explorer Studio. But you will need Watson explorer to get it. There is Watson Knowledge Studio coming soon, which like WEX-Studio allows you to build custom annotators. You can use these annotators with UIMA to parse your text.
So you could easily build something to understand that "I don't want to buy an iPhone" is not the same as "I want to buy an iPhone", and have it extract iPhone as a product.
There is unsupported old free version of WEX-Studio called Languageware, if you want to see if that can help. That site contains manual and videos. Here is a video which I did that gives an example of how you would use it.
Is there any plan to include a "did you mean" feature in the Google App Engine Full Text Search API? It would be very powerful to have a default way of testing full text queries. For example the query "barcak obama" would generate: "Did you mean: barack obama ?" And it would be good to have the feature being able to simultaneously handle many different languages.
Not at this time (but who knows what the future will bring), you would have to implement something on top of it yourself.
One way to get native support is file a feature request and get enough people vote for it. This will help to set the right priorities.