I am working with the Windows Phone Speech Recognition and I would like to be able to take the following speech:
I walked one hundred miles
and be able to be able to know, in my app, that hundred means 100
Any ideas?
According to the documentation RecognizedPhrase.Text Property should contain the display text format which is what you are asking for.
As part of the speech recognition process, the speech recognizer performs speech-to-text normalization of the recognized input into a display form.
For example, the spoken input, "twenty five dollars", generates a recognition result where the Words property contains the words, "twenty", "five", and "dollars", and the Text property contains the phrase, "$25.00". For more information about text normalization, see ReplacementText.
Related
I want to create a change voice function for a meeting application using webrtc.
I have 2 solutions for it:
Solution 1: Use MediaStreamAudioSourceNode and connect node to create audio filter
https://github.com/mdn/webaudio-examples#stream-source-buffer
But I can't control the sex of voice.
Solution 2: I use speech to text and text to speech.
I use speech to text to get the content of the speaker.
Afterthat, I send text to another member in meeting. And use text to speech to create voice.
But the transmission speed is slow and inaccurate in terms of content.
Do you know any AI or library that supports this?
I am building an Angular test preparation app (with Laravel 5.1 API). One of the requirements is to allow the user to print a certificate of achievement.
The client wants the person's name and credentials interpolated into the document (e.g., highlighted below). Here is a snapshot of the PDF template they sent:
The way I'm handling PDF viewing is simply by storing the file on S3 and giving them a link to that file.
Interpolating information into a PDF doc doesn't seem trivial and I haven't found much information on programmatically allowing this, but there are tools like DocHub, that allow you do edit while viewing the PDF.
I'm interested in learning:
is doing this programmatically trivial?
are there 3rd party tools I'm unaware of?
would I even be able to send this information along to the S3 link to interpolate in the first place?
Using PDF as a format for editing is usually a bad choice. If you have a form with fixed fields, then it's easy. Create a PDF template with an interactive form. In this form, based on AcroForm technology, you'll define fields with fixed coordinates, and a fixed size. You can then add content to these fields.
One major disadvantage with this approach is the lack of flexibility. Did you notice that I used the word "fixed" three times in the previous paragraph? If text doesn't fit the predefined field, you're out of luck. If the field is overdimensioned, you'll end up with plenty of white space. This approach is great if you can predict what the data will be like. A typical use case is a ticket or a voucher. For instance: the empty form is a really nice page, with only a couple of fields where an automated system can put a name, a date, a time, and a seat number.
This isn't the best approach for the example you show in your screen shot. The position of every line of text, every word, every character is known in advance. If you want to replace a short word with a long word (or vice-versa), then all those positions (of each line, of the complete page, possibly of the complete document) need to be recalculated. That's madness. Only people with very poor design skills come up with such an idea.
A better idea, is to store the template as HTML. See for instance chapter 5 of iText's pdfHTML tutorial, where we have this snippet of HTML:
<html>
<head>
<title>Invitation to SXSW 2018</title>
</head>
<body>
<u><b>Re: Invitation</b></u>
<br>
<p>Dear <name>SXSW visitor</name>,
we hope you had a great SXSW film festival experience last year.
And we would like to invite you to the next edition of SXSW Film
that takes place from March 9 until March 17, 2018.</p>
<p>Sincerely,<br>
The SXSW crew<br>
<date>August 4, 2017</date></p>
</body>
</html>
Actually, it's not really HTML, because the <name> tag and the <date> tag don't exist in HTML. All HTML processors (browsers as well as pdfHTML) ignore those tags and treat their content as if the tag was a <span>:
It doesn't make much sense to have such tags in the context of pure HTML, but it does make a lot of sense in the case of pdfHTML. With pdfHTMLL, you can configure custom tags, and have a result that looks like the PDFs shown below:
Look at the document for "John Doe" and compare it with the document for "Bruno Lowagie". The name "John Doe" is much shorter than my name, hence more words fit on that first line. The text flows nicely (we could also have chosen to justify the text on both sides). This "flow" is impossible to achieve with your approach, because you will never get a PDF template to reflow nicely.
OK, I get it, you probably say, but what about the practical aspects? You talk about a Java / .Net library, but I am working with Laravel and Angular.js. First, let me tell you that I don't think you'll find any good PDF tools for Laravel or Angular.js, because of the nature of PDF and those development environments (in my opinion, those technologies don't play well together). Regardless of my opinion, this shouldn't be much of a problem for you because you work in an Amazon environment. AWS supports Java, and the Java code needed to get pdfHTML working is minimal. Most of the code samples I wrote for the pdfHTML tutorial are shorter than 15 lines. So why not try Java and pdfHTML?
If you're already using Amazon services, why not use an amazon lambda function, in combination with iText7 (java), to generate the pdf on demand?
That way, you are guaranteed that the pdf is correct, and has nice layout every time.
Generating the pdf can either be done by:
converting HTML,
programmatically creating your entire document,
filling and flattening an XFA form.
I think for your use-case, either option 1 or 2 are the most sustainable.
I`m working at a little text editor. My application is a winapi one in C. The idea is to write text in a large textbox(like in notepad) and then when I press a button it will take all text into a buffer, format it after some rules and then put it in a .txt file.
For example, if my input is:
Anne \red(got) \blue(\bold(apples)) and \italic(\bold(snails!))
After I parse it, it`s possible to put it into a .txt file and after I open it to see it like this?
I want to thank everyone for their time. I got exactly what answer I wanted. Everyone here rocks
I think that you are programming for fun, just for the pleasure of it, and with the perspective of learning more. If that is the objective, then it is okay to invent your own formats and essay your own solutions.
The problem presented can be twofold:
does the format results need to be shown in the editor itself?
or do you just need to do something that is going to be rendered in an external program?
If you are after the first possibility, then you need some Win32 (given your environment) component that will show the formatting. That component is RichEdit, and it implements RTF, a codification that can be saved to a text file, and which is more or less standard.
If you have the second possibility in mind, then you can choose from a variety of codifications. You would just be creating a text editor, probably with some helpers that write part of the commands for the user. For example, you could be creating a HTML editor, or a RTF editor.
There is a third possibility, though. You create your own codification, and when saving, you translate that codification to HTML, and then open the document in a web browser.
Say that you have:
\bold(hello), world.
You would translate that to:
<html><body><b>hello</b>, world.</body></html>
The possibilities, as you can see, are inifinite.
Hope this helps.
I would like to capture anything a user says to Alexa in text form. Exactly how 'Alexa, Simon says...' works. Can someone hint at how that intent can be implemented?
I looked at this, this and this but the suggested answers don't work for me and there are no concrete 'accepted' answers to any yet.
LITERAL slot type works as long as the sample utterance is specified (i.e. hard coded literally). Like the answers suggested in the above threads, I tried to 'train' by providing 400+ combinations of possible utterances hoping that it will somehow figure out the rest of the combinations. But, no dice.
My input could be as random as 'TBD-2019-UK', '17_TBD_UK_Leicester', '17_TBD_UK_Leicester 1', '18_TBD_UK_Leicester 2', 'Chicago IL United States', etc. It is a pretty random combo of the year, city, state, country, some other key text in no particular order (lets ignore the special characters for now). Even if 'Chicago IL United States' is specified in Sample Utterances, LITERAL is not able to capture something like 'Pittsburgh PA United States' automatically unless that is also hard coded. There is no way I can come up with ALL possible permutations and combinations of year, city, state, country, some other key data points (... because it sounds impractical/ridiculous).
Plus, more values could be added by user. So it needs to be smart and dynamic.
The problem is, if there is no matching intent found for the utterance, instead of returning the user's speech text, my Alexa is just failing to do anything. It just goes off without doing anything. Any ideas?
Amazon's Alexa service is not designed for dictation. This has been the consistent response from the Developer Evangelists. So, quite simply, you cannot do what you desire: capture free form speech with wide variations.
There are various ways you can 'trick' Alexa into creating a 'generic slot', which I assume those links talk about. But, since it is outside the design parameters of Alexa, they will never perform well, as you have found.
For your use case, I suggest you break down your inputs into several exchanges. Don't use a one-shot invocation, but a dialog. For example:
U: Alexa, open spiffy skill
A: Welcome to spiffy skill. I'd love to do something spiffy for you,
but I need some information. You can give it to me by saying city,
year, state, or country followed by what you want me to look up.
U: City Cincinatti
A: OK, Got city Cincinatti. I need more information to be spiffy. How
Year?
U: Year 2010
A: OK, I've got Cincinatti, 2010. Should I look that up, or do you have
more info?
U: Look it up.
A: Got it. So for Cincinatti, 2010 ...
Is a wave limited to the sharing of textual information (HTML), or am I correct in assuming that a wave can contain arbitrary data (represented in XML), so long as it also contains the javascript necessary to render it in a meaningful way?
I ask because the collaborative document preparation demonstrated in the Google I/O video looks very powerful, but there are many other types of documents than simple rtf text. In my case I would be looking interactively to develop gantt charts.
There is a lot that can be done inside each Wave. They have not yet made all features available, but here is a link to some samples: http://wave-samples-gallery.appspot.com/ which includes my Slashdot Gadget:http://wave-samples-gallery.appspot.com/about_app?app_id=18006
The Slashdot Gadget actually takes the RSS feed for Slashdot and displays the latest headlines.
Here is the XML: http://www.m1cr0sux0r.com/slashdot.xml
alt text http://www.m1cr0sux0r.com/xml.jpg
I got access to Google Wave a few days ago, and here's what the raw data for their Sokoban game (which supports two players playing simultaneously on the same board) looks like, for example:
<blip>
<p _t="title">
</p>
<p>
<w:gadget author="blixt#wavesandbox.com" prefs="" state="" title="" url="http://sokoban-server.appspot.com/com.example.simplegadget.client.SokobanGadget.gadget.xml">
<w:pref name="playerAllocation" value="1 1,blixt">
</w:pref>
<w:pref name="totalMoves" value="8">
</w:pref>
<w:pref name="playerPositions" value="1 4,2">
</w:pref>
<w:pref name="rockPositions" value="6 2,2 3,2 14,2 15,2 16,2 4,3">
</w:pref>
</w:gadget>
</p>
</blip>
So yes, you can store any data you like in a single blip, with the possibility to go backwards in "time" to see older versions of the data etc.
By the way, if you're interested in seeing some code for a robot that sits in a wave and interacts with users, I made one for a game I'm developing: Google Code Project for multifarce (and the game in question, it's not really public yet and as such not particularly functional.) The bot source is here: multifarce Wave robot source
Basically, all you need to get a bot running are the 14 last lines in that code. I love it! =)