I am interested in querying Amazon Alexa with mp3 files that contain voice commands. I know that Amazon has endpoints (SpeechRecognizer 2.3) that take in MP3 files but I am not sure if this will actually query the Alexa Service -- or more importantly interact with my skill. Any help would be appreciated!
I did this like 4+ years ago. I had to create a virtual AVS device and then a bespoke client using the AVS API. It's a lot of work.
Is there something special about these files that you need to use mp3? You can batch test audio files for speech recognition (is the speech to text what you expect?) with the ASR tool in the developer console.
With the NLU evaluation tool in the console, you can batch test utterances (formatted in JSON) to see which intents they trigger and what values they return in the slots.
And if you're working on unit tests for multi-utterance exchanges, you can use the ASK CLI or the ASK SMAPI API for automation.
The only one of these that uses MP3s is the ASR tool. The rest work with text.
Related
I'm working on an application where we want to try a robot voice for user interactions instead of the current Speech Services standard voices.
That would make the application more exciting since our bot will be talking to kids.
The application shall be speaking Brazil Portuguese.
Questions:
Is there a built-in language model that would accomplish that for pt-BR?
If not would it be possible to customize the standard voice via SSML or C#?
Suggestions are also welcome!
You can look into using espeak for generating a robot-sounding voice. You can also do it in SSML using the "range" parameter with the prosody element. Currently only Microsoft (Azure cloud, SAPI5 and WinRT's Windows.Media.Speech) engines support the "range" attribute.
Example:
<speak version="1.0" xml:lang="pt-BR">
<prosody pitch="x-low" range="-100%">All your base are belong to us</prosody>
</speak>
I have some devices I want to give my clients. E.g. they take it home.
basically I want them to be able to ask the device like an echo dot:
Ask MYAPP, what song is Number One
Ask MYAPP, what song is Number Two
.. etc
and then it reads the name of a song.
My question is: I have never worked with alexa or amazon service.
How long will it take to get it certified?
Do i need to get it certified?
is there an issue with playing a song?
I don't own a device, can I test it well enough without owning one?
is the alexa skill api easy and I get this done rather quickly or is it difficult to get started?
what's a good place to help get me started? i quickly looked at creating a skill set and the procedure seems heavy weight. Is there maybe a forum or some chat where the gurus hang out?
How long will it take to get it certified? - Once you submit the app it will take max 7 business day to get certified (Most of my apps certified in 2 days) - Please read here for certification checklist
Do I need to get it certified? - Yes, it should get certified for your app to be available on Amazon Alexa skill store. If it is not in skill store then other people cannot download to their device and will be available only in your account. To test app you don't need certification as you can try it from your Amazon account
is there an issue with playing a song? - You can play any audio files but current limit of audio file is 90 seconds. Please read more here
I don't own a device, can I test it well enough without owning one?- You don't need a device to test it. You can use echosim - https://echosim.io/ to test your app. Alternatively you can use Raspberry PI as you can enable Raspberry PI as an Alexa enabled device
is the alexa skill api easy and I get this done rather quickly or is it difficult to get started? - It is very easy to do. trust me I have learned and created an app in a week or so
what's a good place to help get me started?- First you need an Amazon account ( I believe you already have). Please find below links for simple end to end samples,
https://developer.amazon.com/alexa-skills-kit/alexa-skill-quick-start-tutorial
https://developer.amazon.com/blogs/post/Tx3DVGG0K0TPUGQ/New-Alexa-Skills-Kit-Template:-Step-by-Step-Guide-to-Build-a-Fact-Skill
There are couple of courses available in Udemy as well
Since this question is referred as related to some current #alexa-skill questions, I like to give some updates to the different points where Amazon has improved the Alexa environment within the 5 years after the initial answers:
Do I need to get it certified? Beside publishing a skill to all Alexa users, there are some other possibilities. You could add further users to your account (but be aware then they see all skills and depending on the roles might also do changes). Another option on skill level access is beta testing, but this is very limited. Last option is Alexa for business, where a skill can be distributed to devices of an organization - this is quite complex, but offers additional context and the option to limit accessibility of the skill to just the organization.
is there an issue with playing a song? besides integrating the audio in SSML, you have the Audio Player Directive, but be aware, that while your audio length has no limits, you leave the skill session. With Alexa Presentation Language Audio (APL-A) you keep the dialog session and have more audio capabilities as on SSML, but still face length limits. Staying inside the skill while not having audio length limits is possible when using APL (Video-)Player component with size 0, but this limits your skill to screen devices.
I don't own a device, can I test it well enough without owning one? The previous answer is not valid, since echosim.io is offline since April 5, 2021. But nowadays the development console has a very good simulator. Additionally you can use a local simulator with Visual Studio Code & ASK Toolkit
testing
is the alexa skill api easy and I get this done rather quickly or is it difficult to get started? In the last view years, Amazon extended the options on how to build & host a skill. With Alexa-hosted you do not need to care about AWS and connecting Alexa cloud with AWS or your own hosting solution and make use of all Skill features. If you need a simpler logic, you could use Alexa Blueprints, which covers the logic and you just provide the content (if you found a matching blueprint for your needed logic) - btw this is also an additional option for the certification question, since a blueprint is normally just for your account and you can share your blueprint instance with others, too.
I've written an AIML file for a chat bot and I'd like to build an interactive web application which allows me to chat with the bot in the web browser.
Is it possible to achieve this with HTML & Javascript?
There is no short answer on how to write a web application which allows a user to interact with your AIML. Writing such an application from scratch will be much more work then compiling the AIML was.
The easiest option would be to use a pre-built service like PandoraBots which allows you to upload AIML files and interact with them in the web browser. It's free to use the explorer part of website. They also have paid developer options which generates an API to bridge your AIML script and any applications you might want to build. It can be easily connected to work with common chat apps like Google talk ect.
If you decide to build everything from scratch you might want to check out the AIML Interpreter library for nodejs.
UPDATE: Here is a node.js based interpreter that you might find useful https://github.com/mrchimp/surly2
I was looking at AIML too and had similar questions. I just found RiveScript RiveScript and it looks like it fits your need to run javascript based on a match. It is not AIML, but very close. There is also at least one tool to convert from AIML to RiveScript, so I would say this fits your needs within those constraints.
I want to implement a feature that you can scan an image of reality by your phone, you will generate a feature code from the image, and then upload it to cloud service. If the database of cloud service has this code, you can download something related to the image. Now, the main problem with me, I need a system or cloud service to help me to identify the images, I don't want to do too much things, so is there hava an existing cloud service to support me do that? Free or paid are ok.
Microsoft has launched recently a new set of machine-learning APIs called "Project Oxford" that include functionality for face detection and recognition, speech recognition and synthesis, vision and understanding of natural languages
Face APIs provide state-of-the-art algorithms to process face images, like face detection with gender and age prediction, recognition, alignment and other application level features. For more information, see Project Oxford at www.projectoxford.ai/face.
Related Link http://azure.microsoft.com/en-in/marketplace/partners/faceapis/faceapis/
http://www.codeproject.com/Articles/989752/Integrate-Windows-Azure-Face-APIs-in-a-Cplusplus-a
I am trying to build a web application that can capture audio and video from a web cam and upload it to our server. The solution should work with both Windows and Mac. Supporting mobile devices would be a plus, but is not required. My boss would prefer if the platform/framework was from Microsoft.
My initial impulse was to start looking into SilverLight... Interestingly, there were plenty of demos showing how to capture video and display it to the user, followed by many comments suggesting that for the application to be useful we need some way to save/upload the video, followed by the original poster saying that of COURSE it's possible and easy and that he is working on an updated demo that does just that, followed by silence. As far as I can tell SilverLight will not record video.
I already have a component that can record video in a winforms application using DirectShow, but the goal is to build something that is cross-platform so that our program will work for Mac users as well as Windows users. A desktop application is not out of the question, but we would much prefer to stick to a web page.
I am aware that Flash can record video from within a browser, but the higher ups would prefer to avoid flash. Is there any other way to record video captured from a user's webcam from within a web browser?
To build a cross-platform solution you shall consider either one of :
VLCj
Xuggler
JMF
I have been working lately with VLCj.
I am aware that Flash can record video from within a browser, but the higher ups would prefer to avoid flash. Is there any other way to record video captured from a user's webcam from within a web browser?
Unfortunately on the desktop there is no other production ready way to record video in a web page except a Flash client linked to a media server like Red5 or Wowza. The Flash client captures and encodes the video and audio and the media server stores the encoded data in .flv or .f4v/.mp4 files.
On mobile you could use HTML Media Capture which, for recording video, is widely supported on all mobile browsers. The downside is that you'd end up with .mov files from iOS and .mp4 and .3gp files from Android devices. The .mov and .3gp files need to be transcoded before they can be used on other platforms.
Quick commercial solutions that implement the above include HDFVR (downloadable) and Pipe (cloud video recording).