I'm creating a seasonal Alexa skill, where there will be intents such as 'how many sleeps till Christmas', 'am I on the good list' etc; and I'd also like an intent to ask Alexa to sing Jingle Bells. The key part is making her sing it.
In my skill, for the singJingleBells intent, I output the the lyrics for Jingle Bells as the speech response, but Alexa reads the lyrics. (as expected if I'm honest).
I've discovered there is a (presumably official Amazon) skill to make her sing Jingle Bells. You can say Alexa, sing Jingle Bells
I would like my skill to do the same.
I'm guessing the Amazon skill does it with SSML phonetics, or more likely, a pre-recorded MP3 via either an SSML audio tag or SSML speechcon interjection
Is there anyway to discover/capture the output response of the Amazon skill so that I can understand (and copy!) the way it does it?
Using Steve's idea, I can use the console on echosim.io to capture the SpeechSynthesizer. Not sure if this gets me any closer?
{
"directive": {
"header": {
"dialogRequestId": "dialogRequestId-6688b290-80d3-4111-a29d-4c60c6d47c31",
"namespace": "SpeechSynthesizer",
"name": "Speak",
"messageId": "c5771361-2a80-4b00-beb6-22a783a7c504"
},
"payload": {
"url": "cid:b438a3ea-d337-4c5f-b719-816e429ed473#Alexa3P:1.0/2017/11/06/20/94a9a7c4112b44568bff10df69d30825/01:18::TNIH_2V.f000372f-b147-4bea-81fb-4c2e7de67334ZXV/0_359577804",
"token": "amzn1.as-ct.v1.Domain:Application:Knowledge#ACRI#b438a3ea-d337-4c5f-b719-816e429ed473#Alexa3P:1.0/2017/11/06/20/94a9a7c4112b44568bff10df69d30825/01:18::TNIH_2V.f000372f-b147-4bea-81fb-4c2e7de67334ZXV/0",
"format": "AUDIO_MPEG"
}
}
}
If I understand correctly, you want to get the Alexa audio output into an .mp3 file (or some other format) so that it can be played back again in a custom skill.
If that's the goal, you'll need to use the Alexa Voice Service (AVS) and more specifically the SpeechSynthesizer Interface to get the audio output that you'd then use in your custom skill response.
So, you'll be using both the Alexa Skills Kit (for the skill) and the Alexa Voice Service (AVS) to get the audio.
You can use an audio clip of 'Jingle Bells' using the audio tag. A maximum of 5 audio tags can be used in a single output reponse.
The audio clip must following points.
The MP3 must be hosted at an Internet-accessible HTTPS endpoint. HTTPS is required, and the domain hosting the MP3 file must present a valid, trusted SSL certificate. Self-signed certificates cannot be used.
The MP3 must not contain any customer-specific or other sensitive information.
The MP3 must be a valid MP3 file (MPEG version 2).
The audio file cannot be longer than ninety (90) seconds.
The bit rate must be 48 kbps. Note that this bit rate gives a good result when used with spoken content, but is generally not a high enough quality for music.
The sample rate must be 16000 Hz.
Refer to this link for more clarity, Audio Tag
Related
I have a problem with Amazon Alexa.
I have started to develop a small skill in Alexa Developer Console.
Everything works perfectly when I test it in that console, but when I tell to Alexa device to open my skill. It tell me "I don't know about that".
I don't understand why, the email address is the same for the Developer Console and for the device. I'm sure that the invocation is correct. I tried to disable and enable the skill from the AlexaApp, but it still doesn't work.
Any ideas? Thank you!
It could be an internal recognition issue from the invocation name.
What I recommend you to do:
Change the invocation name by something very simple ex: Test four
Save the model and Build the model
Try open Test four in the developer console
Try open Test four on the device
If open Test four doesn't work on the device, relogin on the Alexa app with the same email as the developer console and resync the device. Make sure the app is enabled.
If it work with test four it means that the invocation name you choosed previously is not properly recognized by Alexa. You should keep it as simple as possible or ask the support to improve the recognition of it.
I am developing an app where I can send live voice notes. It works just like Whatsapp voice notes, except that the recipient of the voice note can start playing the voice note before the author has finished sending it.
I developed a proof of concept using a webrtc media server. It works like this:
Alice wants to send a voice note to Bob, so she sets up a webrtc connection with the server and starts streaming audio.
The server records the audio as it receives it in file F.
Bob received a notification saying Alice is streaming a voice note (she is still speaking)
Bob opens the app, sets up a webrtc connection with the server, and the server starts streaming the file F in the webrtc connection.
Is there a technology more fit for this type of tasks or should I go along with what I have right now?
Since you're buffering the voice data as a file on the server, you don't really need WebRTC. You can just stream it as regular audio data. WebRTC would be better used if you wanted to stream that audio directly to Bob instead of passing through the server first.
I am making an audio skill using the audio player template with the source code from the official Amazon repo.
Additionally, I have also followed suit with the instructions and added the required PlayAudio intent with the required utterances.
I am using EchoSim to test my Skill. This is the JSON from SpeechSynthesizer.Speak:
{
"directive": {
"header": {
"dialogRequestId": "dialogRequestId-d2e37caa-98b6-4aec-99b1-d24298e422d5",
"namespace": "SpeechSynthesizer",
"name": "Speak",
"messageId": "43150bc3-5fe1-44f0-aeea-fbec4808a4ce"
},
"payload": {
"url": "cid:GlobalDomain_ActionableAbandon_52324515-eee3-4232-b9e4-19edeab556c5_1919623608",
"token": "amzn1.as-ct.v1.#ACRI#GlobalDomain_ActionableAbandon_52324515-eee3-4232-b9e4-19edeab556c5",
"format": "AUDIO_MPEG"
}
}
}
My problem is: this links to a mp3 audio, but no audio is playing. I was wondering if this is indeed the correct response I should be getting, and that its working this way simply because I am not testing on a device, or if there is anything I should modify?
Any insight is much appreciated.
The common issues with audioplayer interfaces is the strict audio requirements, this looks the reason of your issue. The link provided by Amod is for the SSML not audioplayer. Make sure to follow all the requirements for audio stream:
Audio file must be hosted at an Internet-accessible HTTPS endpoint on
port 443.
The web server must present a valid and trusted SSL certificate.
Self-signed certificates are not allowed (really important).Many
content hosting services provide this. For example, you could host
your files at a service such as Amazon Simple Storage Service (Amazon
S3) (an Amazon Web Services offering).
If the stream is a playlist container that references additional
streams, each stream within the playlist must also be hosted at an
Internet-accessible HTTPS endpoint on 443 with a valid and trusted
SSL certificate.
The supported formats for the audio file include AAC/MP4, MP3, PLS,
M3U/M3U8, and HLS. Bitrates: 16kbps to 384 kbps.
This information an be found on the official documentation below:
https://developer.amazon.com/en-US/docs/alexa/custom-skills/audioplayer-interface-reference.html#audio-stream-requirements
I'm trying to export my Alexa Skill / import it into Dialogflow (used to be called API.AI), but I'm getting the following error message:
Invalid Alexa schema json file.
My Zip file is the index.js file and the node_modules folder zipped together. Then I added the Alexa Skill JSON named schema.json to the zip too, but it still gives the same error.
I cannot find instructions on how to export the correct Alexa .zip for import, nor how to format the zip to build it myself. I've been searching for a while -- does anyone know how to do this? (I emailed their support already, but no response yet.)
There were some updates to the Alexa Interaction Model, so the Dialogflow Alexa Importer doesn't seem to work anymore.
There are a few things to consider when porting an Alexa Model into a Dialogflow Agent:
Built-in Intents: You need to create custom Dialogflow intents for built-in Alexa intents like AMAZON.HelpIntent
Built-in Slots: Amazon offers a large variety of slots (e.g. AMAZON.Number) that need to be converted to Dialogflow. For this, Dialogflow offers System Entities. Find all System Entities here.
I created a complete step by step guide and video that uses the Jovo Language Model to translate an Alexa Model into a Dialogflow Agent. You can find it here: Tutorial: Turn an Alexa Interaction Model into a Dialogflow Agent.
Here is an example of the format for the zip: https://github.com/dialogflow/fulfillment-webhook-importer-nodejs/tree/master/skill/speechAssets
The zip should have two files: IntentSchema.json and SampleUtterances.txt
Here is how to get IntentSchema.json and SampleUtterances.txt:
Go to https://developer.amazon.com/edw/home.html#/skills to view all your skills.
Select the skill you'd like to export to by clicking on the skill name for the corresponding skill:
On the left select "Interaction Model" from the list and you should see the screenshot below:
Copy the contents of the the editor and paste it into your IntentSchema.json file and save it.
Next, copy the contents into the editor of the "Sample Utterances" section and paste into your SampleUtterances.txt file and save:
Lastly zip up your IntentSchema.json and SampleUtterances.txt files and upload them to Dialogflow
I'm not sure if your still working on this but if anyone else is stuck, the files you zip have to read IntentSchema.json and SampleUtterances.txt exactly.
I am building an Alexa instructional exercise skill using the Alexa Skill Set SDK on nodejs. I am saving each cooking step to the DB, therefore if the skill times out, the user can reopen the skill and continue where they left off.
Problem is that users are annoyed that they have to keeping reopening the skill, people work at different speeds, is it possible to keep the skill open or increase the time out whilst I wait for the user to complete the step and then say "Alexa, next step"?
I tried increasing the lambda timeout, it made no difference.
I have been trying to do this for quite awhile. There have been several responses on the Amazon developer forums from folks at Amazon (for example, this response) that state that the approximate 8-10 second timeout is not configurable.
The following solution is bit of a hack and not recommended, but may serve your purpose.
Just modify your response like below:
<speak>
Tell recipe step here.
<audio src="<-- Hosted silent mp3 file URL -->" />
</speak>
You can add a silent mp3 file in your response. Your skill will be on for the time of that mp3 file.
But to interrupt Alexa in the mid of this response, user will have to say Alexa, next step instead of Next step.
There is API you could call to provide progressive response