I am using ibm watson's text-to-speech api, how do you generate a slight longer pause via the text? I would like to insert pause or silence into the text so when watson convert the text to speech there is a noticeable pause or 1 or 2 seconds?
sorry i figure this out using the break tag which is listed here https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/text-to-speech/SSML.shtml
Search for (Speech Synthesis Markup Language)SSML tags, when using ibm watson's text-to-speech api.
example: <break time="3s">
This will add a pause of 3 sec.
Related
Our Watson speech to text output includes many instances of the token <eps>. I cannot find any information in the service documentation describing what this is. What is it?
Is it possible to restrict and AVS device (a device running Alexa) to a single skill? So if I built an AI skill and have it running on a device, is it possible to keep the experience inside the custom skill so I don't have to keep saying Alexa, open
One trick you can do with AVS is to prepend every single request with a sound clip equivalent to: "ask to ..." It's definitely a hack, but I was able to use it with some success.
See my write-up here: https://www.linkedin.com/pulse/adding-context-alexa-will-blaschko-ma
The relevant parts (in case the link goes away).
Regular voice commands don't carry my extra information about the
user, but I wanted to find a way to tack on metadata to the voice
commands, and so I did just that--glued it right onto the end of the
command and updated my intents to know what the new structure would
be.
...
In addition to facial recognition, voice recognition could help
identify users, but let's not stop there. Any amount of context can be
added to a request based on available local data.
“Find frozen yogurt nearby" could silently become “Alexa open Yelp and
find frozen yogurt near 1st and Pine, Seattle” using some built in
geolocation in the device (phone, in this case).
I also use something similar in my open source Android Alexa library to send prerecorded commands: https://github.com/willblaschko/AlexaAndroid
I think you are looking for AWS Lex which allows you to write Alexa like skills without the rest of Alexa feature set.
http://docs.aws.amazon.com/lex/latest/dg/what-is.html
I am working on implementing a use case using custom
voice command "take a note" command to get back a response with picture from Glassware.
What's the best way to add a AcceptCommands property to Java Stater Project contact in order to enable custom voice command inside NewUserBootstrapper.bootstrapNewUser() ?
Since Contact class is final, it is not easily extended.
Will there be updated code in java starter project to use voice command like "Take a note" or "Post an Update" as mentioned in XE8 online documentation?
What's your suggestion if we cannot / do not use custom command to get back a picture. Can it be done without using subscription?
Appreciate your help.
Lawrence
Make sure you are using the latest version of the Starter Project, or update your pom.xml to use google-api-services-mirror version v1-rev20-1.16.0-rc which has support for Contact.setAcceptCommands(). See https://github.com/googleglass/mirror-quickstart-java/commit/b7d140bd504643be30ba5d3b36e561e799dd2e61 for the patch that was applied 2 days ago to update to XE8 support, and https://developers.google.com/resources/api-libraries/documentation/mirror/v1/java/latest/com/google/api/services/mirror/model/Contact.html for the latest Contact documentation.
I'm not sure I understand what you're trying to do when you're talking about using a custom command (or not) to get a picture. The standard way to get a picture from Glass to your Glassware is to establish a Contact that accepts the content type and for people to share it. Can you describe the use-case or flow better in your original post?
Is it possible to access the actual speech recording, rather than the transcoded speech text from Mirror API
This is not currently available in the Mirror API.
If this is something you need, file an enhancement request and include a description of what you plan to use it for.
Any silverlight text to speech engine available now? I am looking for very simple text to speech engine which needs to read out numbers.
I dont want to rely on any web service.In worstcase I will record some voices for numbers and stitch them together.
Any pointers are highly appreciated. My application need not work on MAC or linux.
There is another option, which doesn't involve ActiveX or Silverlight 4 COM interop. You simply have your Silverlight application send the text to a WCF service which will convert the text to a WAV stream and then decode the stream returned by the service and place it in a MediaStreamSource for playback in Silverlight. I wrote a blog post on it and it includes sample code.
http://www.brianlagunas.com/index.php/2010/03/07/text-to-speech-in-silverlight-using-wcf
Converting text to speech using speech SDK consists of a few simple steps. The following code shows the important pieces in performing text to speech.
dynamic textToSpeech = AutomationFactory.CreateObject("Sapi.SpVoice");
textToSpeech.Volume = book.Volume;
textToSpeech.Rate = book.SpeekSpeed;
textToSpeech.Voice = book.speeker;
textToSpeech.Speak(book.Content);
SpVoice is the class that is used for text to speech conversion. The speak method takes in a string that needs to be spoken.
code sample: http://funducodes.blogspot.com/p/silver-light.html
You will probably have to build your own for a truely cross compatible application.
Silverlight 3: Use active X to call the Microsoft Speech SDK. (not recommended at all)
Silverlight 4: Use COM integration to call the Microsoft Speech SDK.
These will work on windows only OS.
Of course, with all these suggestions, the underlying flaw is in the speech rendering engine itself - every one of these sample results in a nasty clicking at the start of the speech, I'm thinking this is garbage collection on the stream.
Would be nice to finally have something cross platform that can create realistic speech.
I am not holding my breath.