I am making an audio skill using the audio player template with the source code from the official Amazon repo.
Additionally, I have also followed suit with the instructions and added the required PlayAudio intent with the required utterances.
I am using EchoSim to test my Skill. This is the JSON from SpeechSynthesizer.Speak:
{
"directive": {
"header": {
"dialogRequestId": "dialogRequestId-d2e37caa-98b6-4aec-99b1-d24298e422d5",
"namespace": "SpeechSynthesizer",
"name": "Speak",
"messageId": "43150bc3-5fe1-44f0-aeea-fbec4808a4ce"
},
"payload": {
"url": "cid:GlobalDomain_ActionableAbandon_52324515-eee3-4232-b9e4-19edeab556c5_1919623608",
"token": "amzn1.as-ct.v1.#ACRI#GlobalDomain_ActionableAbandon_52324515-eee3-4232-b9e4-19edeab556c5",
"format": "AUDIO_MPEG"
}
}
}
My problem is: this links to a mp3 audio, but no audio is playing. I was wondering if this is indeed the correct response I should be getting, and that its working this way simply because I am not testing on a device, or if there is anything I should modify?
Any insight is much appreciated.
The common issues with audioplayer interfaces is the strict audio requirements, this looks the reason of your issue. The link provided by Amod is for the SSML not audioplayer. Make sure to follow all the requirements for audio stream:
Audio file must be hosted at an Internet-accessible HTTPS endpoint on
port 443.
The web server must present a valid and trusted SSL certificate.
Self-signed certificates are not allowed (really important).Many
content hosting services provide this. For example, you could host
your files at a service such as Amazon Simple Storage Service (Amazon
S3) (an Amazon Web Services offering).
If the stream is a playlist container that references additional
streams, each stream within the playlist must also be hosted at an
Internet-accessible HTTPS endpoint on 443 with a valid and trusted
SSL certificate.
The supported formats for the audio file include AAC/MP4, MP3, PLS,
M3U/M3U8, and HLS. Bitrates: 16kbps to 384 kbps.
This information an be found on the official documentation below:
https://developer.amazon.com/en-US/docs/alexa/custom-skills/audioplayer-interface-reference.html#audio-stream-requirements
Related
I am using reactjs and flask. After getting the torrent of .mp4 file stored in private s3 bucket, I am trying to displayand play it in browser using webtorrent.
But the video content is not loading
This is a follow up question to Load and play 1 GB .mp4 in reactjs, stored in private s3 bucket. For now, the s3 bucket is public. Because private file was throwing error.
Code for reference - https://codepen.io/drngke/pen/abNGbEg
const magnet = 'https://datavocal.s3.amazonaws.com/s3outputx.mp4.torrent'
const client = new WebTorrent()
client.add(magnet, (torrent) => {
console.log(torrent.files)
torrent.files[0].appendTo('body')
});
client.on("error", (err) => console.log(err))
I'm not sure if this will work.
From the webtorrent docs:
To make BitTorrent work over WebRTC (which is the only P2P transport that works on the web) we made some protocol changes. Therefore, a browser-based WebTorrent client or "web peer" can only connect to other clients that support WebTorrent/WebRTC.
And further:
To seed files to web peers, use a client that supports WebTorrent, e.g. WebTorrent Desktop, a desktop client with a familiar UI that can connect to web peers, webtorrent-hybrid, a command line program, or Instant.io, a website.
So I'm guessing S3 would have to support WebTorrent/WebRTC which I don't think it will.
If my understanding of the above diagram is correct, you could run a hybrid client in between S3 and your web-peers, however you'd then need to host the hybrid somewhere which kind of makes S3 redundant in that setup.
I am developing a simple audio client for Watson assistant solutions and I am having problems authorizing the client.
I am following this guide https://watson-personal-assistant.github.io/developer/audio/audio_authentication/ but the Api Key I am using is not recognized.
The error message I get is the following:
"errorMessage": "Provided API key could not be found"
The Api Key I am using is the one displayed in the user's card (that appears when clicking the user's avatar in the top-right corner of the page).
In the console there is the Clients tab which states:
A client can be a device such as a smart speaker or wearable, but it could also be a mobile app or web-based chatbot. Use this page to create credentials for those clients and assign an entity to them.
I thought that an Api Key could be created here, but it is not.
The Watson Assistant Solutions Service is now using IAM API key instead of the API key for the MultiTenant Audio Gateway. This does pre-req that you have a An IBM Cloud ID account
To create your own IBM IAM API key use these directions https://console.bluemix.net/docs/iam/userid_keys.html#userapikey
You also need your tenant id you can find that in the WASol Console.
Your client will have to send the following properties
Server connection parameterss. For userID note to not include ( # # . or other special chars). There is an issue we are fixing.
host=wa-audio-gateway.mybluemix.net
userID=carlos.ferreira
IAM API Key is Used to authenticate the client device
IAMAPIKey=yourIAMAPIkey
Choosing which skill set to use (Required parameter)
skillset=industry
Your tenant ID (Required parameter)
tenantID=yourtenantID
Client language specific preferences can be passed (Optional parameter with a default value: en-US)
language=en-US
Choosing which STT and TTS engine to convert audio to text and text to audio - possible values are : watson, google , (Optional parameter with a default value : watson)
engine=google
Controls playback method. Playback using an audio URL in the response [true], playback by streaming audio from the server [false]
urltts=false
You can find a reference Java implementation for the Audio Gateway here. https://github.com/Watson-Personal-Assistant/AudioClientSampleCodeJava
Please note that you also need to use IBM APIKey for programmatic access to the WASol Core text routing service. Here is a code example I did to get Amazon Dot/Alexa skill to communicate with WASol Assistant skill set.
I would like to have a service in appengine flexible that has a UDP server that takes incoming udp traffic on a given port and redirects it to another service in appengine standard that uses HTTPS.
It is my understanding that flex environment allows opening UDP listen sockets and indeed my application starts the server OK. However, I cannot make any traffic reach the UDP server.
I suspect the problem is a GAE or Docker configuration problem but I cannot find documentation or similar issues online to solve it. All Google documentation for appengine flexible is around HTTPS. So any guidance would be helpful. I have several questions that I believe relate to my understanding on Flexible Appengine, the VM and Docker:
Is flex appengine supposed to be used at all as a UDP server? lack
of documentation on UDP load balancing seems to indicate me no...
Any ideas if this is on the roadmap?
If supported, to which IP/URL should I direct my UDP traffic? Is it to my-project . appspot . com or to each of the individual VM instances (would seem like a bad idea since VMs are ephemeral)?
This is my current application
app.yaml
As you can see I forwarded my listen UDP port as explained here
runtime: python
env: flex
entrypoint: python main.py
runtime_config:
python_version: 2
network:
forwarded_ports:
- 13949/udp
service: udp-gateway
# This sample incurs costs to run on the App Engine flexible environment.
# The settings below are to reduce costs during testing and are not appropriate
# for production use. For more information, see:
# https://cloud.google.com/appengine/docs/flexible/python/configuring-your-app-with-app-yaml
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
For the server I am using python SocketServer in threaded mode and I am keeping my main thread in an infinite loop in order not to exit the server.
I have also added a firewall rule in my GCP console:
{
"allowed": [
{
"IPProtocol": "udp",
"ports": [
"13949"
]
}
],
"creationTimestamp": "2018-02-24T16:39:24.282-08:00",
"description": "allow udp incoming on 13949",
"direction": "INGRESS",
"id": "4340622718970870611",
"kind": "compute#firewall",
"name": "allow-udp-13949",
"network": "projects/xxxxxx/global/networks/default",
"priority": 1000,
"selfLink": "projects/xxxxx/global/firewalls/allow-udp-13949",
"sourceRanges": [
"0.0.0.0/0"
]
}
So I ended up being able to answer my own questions (thanks SO for allowing me to put down my thoughts, it helps :))
Indeed, flex environment only features a load balancer for HTTPS, which means that even if it is possible to open UDP sockets, it is not meant to be used as an UDP server. I have not found any evidence, Google plans to add support for UDP/TCP load balancing for Appengine flex. The next service that offers UDP load balancing is Kubernetes Engine (and Compute Engine of course). So that is where I am headed now.
With the configuration described in the OP, I could make traffic reach my application by addressing individual instances' IP. However, this is not meant to be used in a production application since instances are ephemeral and also does not scale (would need to do my own load balancer which is out of the question),
I'm creating a seasonal Alexa skill, where there will be intents such as 'how many sleeps till Christmas', 'am I on the good list' etc; and I'd also like an intent to ask Alexa to sing Jingle Bells. The key part is making her sing it.
In my skill, for the singJingleBells intent, I output the the lyrics for Jingle Bells as the speech response, but Alexa reads the lyrics. (as expected if I'm honest).
I've discovered there is a (presumably official Amazon) skill to make her sing Jingle Bells. You can say Alexa, sing Jingle Bells
I would like my skill to do the same.
I'm guessing the Amazon skill does it with SSML phonetics, or more likely, a pre-recorded MP3 via either an SSML audio tag or SSML speechcon interjection
Is there anyway to discover/capture the output response of the Amazon skill so that I can understand (and copy!) the way it does it?
Using Steve's idea, I can use the console on echosim.io to capture the SpeechSynthesizer. Not sure if this gets me any closer?
{
"directive": {
"header": {
"dialogRequestId": "dialogRequestId-6688b290-80d3-4111-a29d-4c60c6d47c31",
"namespace": "SpeechSynthesizer",
"name": "Speak",
"messageId": "c5771361-2a80-4b00-beb6-22a783a7c504"
},
"payload": {
"url": "cid:b438a3ea-d337-4c5f-b719-816e429ed473#Alexa3P:1.0/2017/11/06/20/94a9a7c4112b44568bff10df69d30825/01:18::TNIH_2V.f000372f-b147-4bea-81fb-4c2e7de67334ZXV/0_359577804",
"token": "amzn1.as-ct.v1.Domain:Application:Knowledge#ACRI#b438a3ea-d337-4c5f-b719-816e429ed473#Alexa3P:1.0/2017/11/06/20/94a9a7c4112b44568bff10df69d30825/01:18::TNIH_2V.f000372f-b147-4bea-81fb-4c2e7de67334ZXV/0",
"format": "AUDIO_MPEG"
}
}
}
If I understand correctly, you want to get the Alexa audio output into an .mp3 file (or some other format) so that it can be played back again in a custom skill.
If that's the goal, you'll need to use the Alexa Voice Service (AVS) and more specifically the SpeechSynthesizer Interface to get the audio output that you'd then use in your custom skill response.
So, you'll be using both the Alexa Skills Kit (for the skill) and the Alexa Voice Service (AVS) to get the audio.
You can use an audio clip of 'Jingle Bells' using the audio tag. A maximum of 5 audio tags can be used in a single output reponse.
The audio clip must following points.
The MP3 must be hosted at an Internet-accessible HTTPS endpoint. HTTPS is required, and the domain hosting the MP3 file must present a valid, trusted SSL certificate. Self-signed certificates cannot be used.
The MP3 must not contain any customer-specific or other sensitive information.
The MP3 must be a valid MP3 file (MPEG version 2).
The audio file cannot be longer than ninety (90) seconds.
The bit rate must be 48 kbps. Note that this bit rate gives a good result when used with spoken content, but is generally not a high enough quality for music.
The sample rate must be 16000 Hz.
Refer to this link for more clarity, Audio Tag
If you like to use the S3 with the popular Cyberduck app, from Swisscom Application Cloud you have to use a custom connection profile with AWS2.
You can find this profile here for download
Authentication with signature version AWS2
Incomplete list of known providers that require the use of AWS2
Riak Cloud Storage
EMC Elastic Cloud Storage
Thank you very much for sharing this nice tool tip. I added here a few screenshots for clarification.
1) brew cask install cyberduck
2) Download linked S3 AWS2 Signature Version (HTTPS).cyberduckprofile File and open it with Cyberduck.
3) copy credentials and host from cf env or create service keys.
System-Provided:
{
"VCAP_SERVICES": {
"dynstrg": [
{
"credentials": {
"accessHost": "ds31s3.swisscom.com",
"accessKey": "24324234234243456546/CF_P8_FFGTUZ_TGGLJS_JFG_B347EEACE",
"sharedSecret": "sfdklaslkfklsdfklmsklmdfklsd"
},
"label": "dynstrg",
"name": "cyberduck-testing",
"plan": "usage",
"provider": null,
"syslog_drain_url": null,
"tags": [],
"volume_mounts": []
}
],
sharedSecret is named "Secret Access Key" in Cyberduck
create initial bucket (it's called Folder in Cyberdurck)
upload with Drag and Drop some files
Some commandline alternatives (Open Source) what some people use with Swisscom's EMC Atmos (dynstrg Service) are
S3cmd
S3cmd is a free command line tool and client for uploading, retrieving
and managing data in Amazon S3 and other cloud storage service
providers that use the S3 protocol, such as Google Cloud Storage or
DreamHost DreamObjects. It is best suited for power users who are
familiar with command line programs. It is also ideal for batch
scripts and automated backup to S3, triggered from cron, etc.
Minio Client
Minio Client is a replacement for ls, cp, mkdir, diff and rsync
commands for filesystems and object storage.