Watson text to speech latency - ibm-watson

I'm using Watson text to speech service using Node.js. Instead of downloading the audio file I'm using this to stream it:
text_to_speech.synthesize(params)
.pipe(new ogg.Decoder())
.on('stream', function (opusStream) {
opusStream.pipe(new opus.Decoder())
.pipe(new Speaker());
});
It works, but my problem is that there is too much latency, like I hear the sound after 5 or 8 seconds after the request.
Is this normal? Does anyone have an idea to ameliorate my code?
I know that streaming may have some time to load and play but I'm talking about decreasing the latency time.

Related

"progressively" load video in NextJS (from DatoCMS/mux)

I'm using DatoCMS and NextJS to build a website. DatoCMS uses Mux behind the scenes to process the video.
The video that comes through is fairly well optimised for whatever browser is being used, and potentially for ABR with HLS; however, it still can take a fair bit of time on the initial load.
The JSON from Dato includes some potentially useful other things:
"video": {
"mp4Url": "https://stream.mux.com/6V48g3boltSf5uQRB8HnelvtPglzZzYu/medium.mp4",
"streamingUrl": "https://stream.mux.com/6V48g3boltSf5uQRB8HnelvtPglzZzYu.m3u8",
"thumbnailUrl": "https://image.mux.com/6V48g3boltSf5uQRB8HnelvtPglzZzYu/thumbnail.jpg"
},
"id": "44785585",
"blurUpThumb": ""
}
With either next/image, or the more proprietary react-datocms/image, that blurUpThumb could be used as a placeholder while the full image is being loaded in the background, to improve UX, and (I believe) page-load speed / time to interactive.
Is there a way to achieve that same effect with the video instead of a file?
The usual way an ABR, HLS or DASH etc, video can start faster is by starting with one of the lower resolutions and stepping up to a higher resolution after the first couple of segments once the video is playing and there is more time to buffer.
However in your case, the example video is very short, 13 seconds so the effect is pretty minimal. Playing it on Safari on a MAC I saw the step happen at 4 seconds which is almost a 3rd through already in this case.
Short of re-encoding with lower resolution or some special codecs I think you may find this hard to beat - Mux is a pretty mature video streaming service.
The direct links to the videos above loaded and played quite quickly for me, even over a relatively low speed internet. It might be worth looking at what else your page is loading at the same time as this may be competing for bandwidth and slowing things down.

Alexa Skill: how to play mp3 longer that 4min

I've written a rather simple Alexa Skill that play pre-recorded tales for children, recorded by professional actors (long better than mechanical voice of Alexa). Everything works fine, you can chose which kind of story and the story get chosen randomly from an array.
The problem is that I am actually play mp3 using SSML and it that limits the audio file to max 4 minutes.
I could cut the longer stories in multiple .mp3 files, but I don't know how to create a "progressive reply".
Any suggestion?
There are certain limitations in embedding audio tag in SSML. The audio file cannot be longer than 240 seconds.
If your stories are longer than 4 mins consider upgrading it to an AudioPlayer response. Audio Player interface lets you to play longer mp3 audio files. With AudioPlayer interface you can play one story after the other, or repeat it.
The AudioPlayer interface provides directives and requests for streaming audio and monitoring playback progression. Your skill can send directives to start and stop the playback. The Alexa service sends your skill AudioPlayer requests to give you information about the playback state, such as when the track is nearly finished, or when playback starts and stops.
More on audio tag here
More on AudioPlayer intetrface here

Medium to large file uploads with progress updates in AspNet Core

By medium to large I mean anything from 10mb -> 200mb (sound files if that is important)
basically I want to make an API that does some spectral analysis on the file itself, this would require a file upload. But for UI/UX reasons it would be nice to have a progress bar for the upload process. What are the common architectures for achieving this interaction.
The client application uploading the file will be a javascript client (reactjs/redux) and the API is written in ASP.NET Core. I have seen some examples which use websockets to update the client on progress, and other examples where the client polls for status updates given a resource url to query the status. Are there any best practices (or the "modern way of doing this") for doing such a thing that I should know of? TIA
In general, you just need to save progress status while reading the input stream in your controller to some variable (session-specific variable, because there might be a few file uploading sessions at the same time) and then get this status from the client-side by ajax requests (or signalr).
You could take a look at this example: https://github.com/DmitrySikorsky/AspNetCoreUploadingProgress
I have tried 11 MB files with no problems. There is line
await Task.Delay(10); // It is only to make the process slower
there, don't forget to remove it in the real solution.
In this sample files are loaded by the ajax, so I didn't try really large files, but you can use iframe solution from this sample:
https://github.com/DmitrySikorsky/AspNetCoreFileUploading
The other part will be almost the same.
Hope this helps you. Feel free to ask if have any additional questions.

What audio format works for Silverlight + WPF?

I'm writing a pair of applications for distributing audio (among other features). I have a WPF program that allows an artist to record and edit audio. Clicking a button then uploads this to a silverlight-powered website. A consumer visiting this website can then listen to the audio. Simple. It works. But I'd like it to be better: I need an audio format that works seamlessly on both the recording and playback sides.
I'm currently using mp3 format, and I'm not happy with it. For the recording/editing, I use the Alvas Audio c# library. It works ok, but for MP3 recording requires that the artist goes into his registry to change msacm.l3acm to l3codecp.acm. That's a lot to ask of an end-user. Furthermore mp3 recording seems rather fragile when I install on a new machine. (Sometimes it randomly just doesn't work until you've fiddled around for a while. I still don't know why.) I've been told that unless I want to pay royalties to the mp3 patent holders, I always need to rely on this type of registry change.
So what other audio format could I use instead? I need something compressed. Alvas audio can also record to GSM, for example, but that won't play back in silverlight. Silverlight will play WMA, but I don't know how to record in that format - Alvas Audio won't. I'd be open to using another recording library instead, but I haven't managed to find one.
Am I missing something obvious, or is there really no user-friendly way to record audio in WPF and play it back in Silverlight? It seems like there should be...
Any suggestions greatly appreciated.
Thanks.
IMO, WMA would be your best bet. I'm not sure how your application is setup or how low level you want to go, but the Windows Media Format SDK is a great way to encode WMA and the runtimes come with Windows. There are .NET PIAs and samples for it here: http://windowsmedianet.sourceforge.net/
Given that Ogg Vorbis is being adopted for the new HTML audio tag in (cough) some browsers, it's probably worth checking it out. You won't get bitten by any licensing concerns if you follow this route. If ease of deployment is top of your list, then go with WMA.
[tries hard not to start ranting about fragmented state of codec options in browsers and the commercial interests that scupper any concensus]

Getting Silverlight Video Stream

Suppose there is a Silverlight streaming video player on a random web site. How can I intercept the video stream and for example save it to file - i.e. the real source of the file.
I know some of the sites embed the source in tag - or at least that was the case with Flash. But sometimes, players are smarter than that and call some logic via web service. It is still possible to figure everything out by analyzing the .dll with reflector, but that is hardcore! Every player may have a different logic, so I figured out it would be easier to just get the current stream somehow.
Any thoughts?
Ooook! Got me an answer that could be used as a nice workaround. With the use of fiddler I was able to capture the traffic and figure out what's going on. Now I'm happily watchin the same video as before only using the uber feature of WMP that lets me play videos faster.

Resources