Speaker Labeling is not consistent?? even if it is only 2 speaker - ibm-watson

When i try in the demo transcribing audio to text is so accurate this is the output in the demo
Speaker 0:
Hello.
Speaker 1:
Hi is this Tina.
Speaker 0:
Yes it is who is this.
this is my output
Speaker 0:
Hello.
Speaker 1:
Hi is this Tina.
Speaker 0:
Yes it is this this
this is my set up in recognize
private RecognizeOptions getRecognizeOptions(InputStream captureStream) {
return new RecognizeOptions.Builder()
.audio(captureStream)
.contentType(HttpMediaType.AUDIO_MP3)
.model("en-US_NarrowbandModel")
.interimResults(true)
.inactivityTimeout(-1)
.timestamps(true)
.speakerLabels(true)
.smartFormatting(true)
.build();
}
when i trying to change the model into a en-US_Broadband this the output
Speaker 0:
Hello.
Speaker 1:
Hi is this Tina. Yes it is who is this
The diferrence is the word Yes it is who is this is a different speaker so the expected result will be this
Speaker 0:
Hello.
Speaker 1:
Hi is this Tina.
Speaker 0:
Yes it is who is this.
please help is this bug or there is something error on my code by the way i am using mp3 file not wav file

What you are discovering is that the sampling rate for the audio is significant when transcribing.
From the documentation - https://console.bluemix.net/docs/services/speech-to-text/index.html#about
Use broadband for audio that is sampled at a minimum rate of 16 kHz.
Use narrowband for audio that is sampled at a minimum rate of 8 kHz.
Consequently audio that is sampled at 8kHz is not going to be as well transcribed when using a broadband model.

Related

Silverlight Plug in Crashes in Video Conference

We hav developed an application for video conference using silverlight.
It works properly for 15 to 19 min then video get stopped and silverlight plugin has crashed.
for video encoding we r using the JPEG encoder and single image from capturesource get encoded and send on each tick of timer..
I also tried to use Silversuite but message popup arrives i.e. Silversuite expire
Is der proper solution for encoding or timer or plug in...
Thanx...
we extend the crashing period from 15 min to 1 to 1 n 1/2 hr ......by flushing the mem stream and decreasing the receiving buffer size .....

Syncronization audio and video

I need to display stream video using MediaElement in Windwso Phone application.
I'm getting from web-service a stream that contains frames in H264 format AND raw-AAC bytes (strange, but ffmpeg can parse with -f ac3 parameter only).
So, if try to play only one of stream (audio OR video) it plays nice. But I have issues when try it both.
For example, if I report video sample without timestamp and report audio with timestamp, my video plays 3x-5x faster then I need.
MediaStreamSample msSamp = new MediaStreamSample(
_videoDesc,
vStream,
0,
vStream.Length,
0,
_emptySampleDict);
ReportGetSampleCompleted(msSamp);
From my web-service I getting a DTS and PTS for video and audio frames in following format:
120665029179960
but when I set it for sample, my audio stream plays too slow and with delays.
Timebase is 90khz.
So, could someone tell me how I can resolve it? Maybe I should calculate others timestamps for samples? If so, show me the way, please.
Thanks.
Okay, I solved it.
So, what I need to do for sync A/V:
Calculate right timestamps for each video and audio frames using framerate.
For example, for video I have 90 kHz and for audio 48 kHz and 25 frames per second - my frame increments will be:
_videoFrameTime = (int)TimeSpan.FromSeconds((double)0.9 / 25).Ticks;
_audioFrameTime = (int)TimeSpan.FromSeconds((double)0.48 / 25).Ticks;
And now we should add these values for each sample:
private void GetAudioSample()
{
...
/* Getting sample from buffer */
MediaStreamSample msSamp = new MediaStreamSample(
_audioDesc,
audioStream,
0,
audioStream.Length,
_currentAudioTimeStamp,
_emptySampleDict);
_currentAudioTimeStamp += _audioFrameTime;
ReportGetSampleCompleted(msSamp);
}
For gettign video frame method will be the same with a _videoFrameTime incrementation instead.
Hope this will be helpfull for someone.
Roman.

Control servo with keyboard or other hardware buttons?

I have just gotten started with Arduino and barely have any idea about more of the advanced stuff. It seems pretty straightforward. Now I'm one who usually likes to integrate two devices together, so i was wondering if i could control a servo with the computer's keyboard or two hardware push buttons attached to the Arduino board.
In case it helps, I'm using an Arduino Uno board. Here is the example code i am using to sweep the servo for now
// Sweep
// by BARRAGAN <http://barraganstudio.com>
// This example code is in the public domain.
#include <Servo.h>
Servo myservo; // create servo object to control a servo
// a maximum of eight servo objects can be created
int pos = 0; // variable to store the servo position
void setup()
{
myservo.attach(11); // attaches the servo on pin 9 to the servo object
}
void loop()
{
for(pos = 0; pos < 45; pos += 1) // goes from 0 degrees to 180 degrees
{ // in steps of 1 degree
myservo.write(pos); // tell servo to go to position in variable 'pos'
delay(10); // waits 15ms for the servo to reach the position
}
for(pos = 45; pos>=1; pos-=1) // goes from 180 degrees to 0 degrees
{
myservo.write(pos); // tell servo to go to position in variable 'pos'
delay(10); // waits 15ms for the servo to reach the position
}
}
Now, let's say I wanted to change the servo's angle via pressing the
left/right arrow keys on my computer's keyboard. How would i go
about doing that?
Alternatively, what if i attached two push buttons to the Arduino,
and pressing one would move the servo either left or right depending
on the the button. Which ports would i plug the buttons into? Any
code samples or diagrams would greatly help!
To move a servo attached to an arduino attached to a computer you will need two components.
You will need software on your computer to accept keyboard commands and send commands to the arduino via the serial port. I would recommend a language like python or java to do that as a simple app can written quite easily.
Check this playground link for an example of using Java. And for an example in python check out this project.
There is a bug/feature built into the arduino that will give you grief as you go on here. The arduino is designed to auto reset when a serial connection is made to it via usb. This page has a detailed description of the issue and cites several ways to deal with it.
You will need to modify the sketch on the arduino to listen to the serial port and adjust the servo's position based on the commands received from your computer. Check out the python link above. It is an complete (hardware, pc software and arduino sketch) project designed to do something very similar to what you are trying to do.
I recommend you start with either component and try to get it going. As you run into problems, post your code and someone will be glad to help further.
As for the second question, adding buttons to the arduino is fairly simple. You will connect them to digital inputs. There are hundreds of examples on the web. Search for "add button to arduino" and see what you get. (lol... 1.3 million hits). Here again, try it and post specifics for more help.
For serial communication use putty
it is a cross platform Serial and ssh client
for the left and right arrow commands:
there are no ascii characters for arrow's: but there are utf-8;
putty or an other client sends utf-8 characters for the basic ascii characters are utf-8 and ascii exactly the same;
and the arduino reads only ascii characters;
the arduino reads
--> : 27, 91, 67
<-- : 27, 91, 68
so it is not that simple to read that.
you could use something like this
int pos = 0;
Serial.flush(); // flush all received data
while(Serial.avaialble()<3); // wait for the 3 ascii chars
if(Serial.read()==27){ // first char
if(Serial.read()==91){ //second char
switch (Serial.read()){
case 67: // Right arrow
myservo.write(++pos); // increment pos with 1 before write it
break;
case 68: // left arrow
myservo.write(--pos); // derement pos with 1 before write it
break;
case 65: // up arrow
myservo.write(++pos); // increment pos with 1 before write it
break;
case 66: // down arrow
myservo.write(--pos); // decrement pos with 1 before write it
break;
case default:
break;
}
}
}
but this is not a good solution
because of the arrow character is send in 3 bytes en when you flush it can flush the 27 so you read 91, 97, 27; and that is no valid so in doesn't work
you could write a algorithm to subtract the arrow command out of 5 ascii char's
or you can use 4 to move left and 6 to move right; which are ascii characters and in a numeric keypad are arrows drawn on those keys

Windows Phone 7.1 play recorded PCM/WAV audio

I'm working on a WP7.1 app that records audio and plays it back. I'm using a MedialElement to playback audio. The MediaElement works fine for playing MP4 (actually M4A files renamed) downloaded from the server. However, when I try to play a recorded file with or without the WAV RIFF header (PCM in both cases) it does not work. It gives me an error code 3001, which I cannot find the definition for anywhere.
Can anyone point me to some sample code on playing recorded audio in WP7.1 that does not use the SoundEffect class. Don't want to use the SoundEffect class because it's meant for short audio clips.
This is how I load the audio file:
using (IsolatedStorageFile storage = IsolatedStorageFile.GetUserStoreForApplication())
{
using (Stream stream = storage.OpenFile(audioSourceUri.ToString(), FileMode.Open))
{
m_mediaElement.SetSource(stream);
}
}
This playing code looks good. Issue have to be in storing code. BTW 3001 means AG_E_INVALID_FILE_FORMAT.
I just realized that the "Average bytes per second" RIFF header value was wrong. I was using the wrong value for the Bits per Sample value, which should've been 16 bit since the microphone records in 16-bit PCM.

Is it possible to capture both mic and line-in at the same time using ALSA?

Not terribly familiar with ALSA, but I'm supporting an application that uses it.
Is it possible to record audio from both the mic and line-in simultaneously? Not necessarily mixing the audio, though that is a possibility that has been requested. Can both be set to record and use ALSA to read each individually?
Documentation on ALSA is not terribly helpful, and this is basically my first sojourn into sound mixing on Linux using ALSA.
Any and all help would be greatly appreciated; hoping there is someone out there that has done something like this in the past and either has a sample to share or a link to point me in the right direction.
Maybe this can be done: Not sure, but from http://www.jrigg.co.uk/linuxaudio/ice1712multi.html ,not tested, but this will give you 1 virtual device with 4 channels.
pcm.multi_capture {
type multi
slaves.a.pcm hw:0
slaves.a.channels 2
slaves.b.pcm hw:1
slaves.b.channels 2
bindings.0.slave a
bindings.0.channel 0
bindings.1.slave a
bindings.1.channel 1
bindings.2.slave b
bindings.2.channel 0
bindings.3.slave b
bindings.3.channel 1
}
I dont know if you can mix them with route or the correct sintax:
pcm.route_capture {
type route
slave.pcm "multi_capture"
ttable.0.0 0.5
ttable.1.1 0.5
ttable.0.2 0.5
ttable.1.3 0.5
}
or
pcm.route_capture {
type route
slave.pcm "multi_capture"
ttable.0.0 0.5
ttable.1.1 0.5
ttable.2.0 0.5
ttable.3.1 0.5
}
If someone test, please tells us the results? Thank you!
I wish you luck!
arecord -l will give you a list of available capture devices. In my case:
**** List of CAPTURE Hardware Devices ****
card 0: M2496 [M Audio Audiophile 24/96], device 0: ICE1712 multi [ICE1712 multi]
Subdevices: 1/1
Subdevice #0: subdevice #0
So, with my card, you would be out of luck - there is only one device (i.e. only one distinct source). This device will give you all data routed to it by hardware, as configured by an external mixer application.
With some cards it might, however, be possible to route MIC to channel 1 (left) and LINE to channels 2 (right), and then record 2 channels, separating them as needed in your application. Of course, if supported by hardware, you could also use two channels each and record four channels.

Resources