How to set up dataset and train whisper model from OpenAI with custom dataset [duplicate]

How to set up dataset and train whisper model from OpenAI with custom dataset [duplicate] - dataset

I use OpenAI's Whisper python lib for speech recognition. I have some training data: either text only, or audio + corresponding transcription. How can I finetune a model from OpenAI's Whisper ASR on my own training data?

From https://github.com/openai/whisper/discussions/64, the released code doesn't contain the training/finetuning part. Therefore one would have to write it to be able to train/finetune a model from OpenAI's Whisper ASR on my own training data.
Also, from https://openai.com/blog/whisper/:
We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing.
No training code mentioned.
William Castrillon and nizata pointed to the following fine-tuning codes created by third-party developers:
https://huggingface.co/blog/fine-tune-whisper (code)
https://github.com/openai/whisper/discussions/64
https://huggingface.co/spaces/openai/whisper/discussions/6

Related

Sagemaker Inference Endpoint with fitted Encoder

So as I don't get any help by reading documentations and blogposts I ll ask over here:
I want to deploy a Sagemaker Endpoint with fitting a Sagemaker Pipeline. I want to have an endpoint which is backed by a PipelineModel. This PipelineModel should consist of two models: A fitted model which encodes my data and a model which predicts with an XGBoost estimator. I follow along this docu: enter link description here
But this example doesn't show how to integrate the fitted preprocessor model in a PipelineStep. What Step do I have to use? A TrainingStep? Thanks in advance. I am desperate

Check out this official example: Train register and deploy a pipeline model.
The two variations to keep in mind:
For models that need training (usually for those based on tensorflow/pytorch), a TrainingStep must be used so that the output (the model artifact) is correctly (and automatically) generated with the ability to use it later for inference.
For models generated by a simple fitting on the data (e.g., a scaler with sklearn), you can think about creating a TrainingStep in disguised (it is an extra component in pipeline, it is not very correct to do it but it is a working round) but the more correct method is to configure the preprocessing script so that it internally saves a model.tar.gz file with the necessary files (e.g., pickle or joblib objects) inside it can then be properly used in later steps as model_data. In fact, if you have a model.tar.gz, you can define a Model of various types (e.g., an SKLearnModel) that is already fitted.
At this point, you define your PipelineModel with the trained/fitted models and can either proceed to direct endpoint deployment or decide to go through the model registry and keep a more robust approach.

Is there a way to show pdf in its original structure in the human review custom entity labelling in aws sagemaker?

I have modified this sample to read PDFs in tabular format. I would like to keep the tabular structure of the original pdf when doing the human review process. I notice the custom worker task template uses the crowd-entity-annotation element which seems to read only texts. I am aware that the human reviewer process reads from an S3 key which contains raw text written by the textract process.
I have been considering writing to S3 using tabulate but I don't think that is the best solution. I would like to keep the structure and still have the ability to annotate custom entities.

Comprehend now natively support to detect custom-defined entities for pdf documents. To do so, you can try the following steps:
Follow this github readme to start the annotation process for PDF documents.
Once the annotations are produced. You can use Comprehend CreateEntityRecognizer API to train a custom entity model for Semi-structured document”
Once entity recognizer is trained, you can use StartEntitiesDetectionJob API to run inference for PDF documents

Sagemaker Java client generate IOrecord

I am trying to build a training set for Sagemaker using the Linear Learner algorithm. This algorithm supports recordIO wrapped protobuf and csv as format for the training data. As the training data is generated using spark I am having issues to generate a csv file from a dataframe (this seem broken for now), so I am trying to use protobuf.
I managed to create a binary file for the training dataset using Protostuff which is a library that allows to generate protobuf messages from POJO objects. The problem is when triggering the training job I receive that message from SageMaker:
ClientError: No training data processed. Either the training channel is empty or the mini-batch size is too high. Verify that training data contains non-empty files and the mini-batch size is less than the number of records per training host.
The training file is certainly not null. I suspect the way I generate the training data to be incorrect as I am able to train models using the libsvm format. Is there a way to generate IOrecord using the Sagemaker java client ?

Answering my own question. It was an issue in the algorithm configuration. I reduced mini batch size and it worked fine.

HRTF database used in Resonance

I am doing a research on 3D audio plugins working on unity for my Master's thesis at Aalto University Finland.
I was wondering if Resonance was using KEMAR hrtf database or something customized?

Resonance Audio uses custom HRIRs that were derived from the SADIE database. You can refer to the corresponding spatial-audio section in the public spatial-media repo on GitHub for further information and resources.

Virtual Assistants knowledge

I am studying about artificial intelligence
How did the virtual assistants analyze questions?
Example: when I said "Way from New York City to Washington DC" , va opened google map?

If I would develop that stuff myself I'd probably would use pattern like
If ({a} {linker} {b}) where a and b are in list of cities on map or something related to that and linker is word like "from" or "to" i would run google maps with, also if any of remaining words would signal that it is map related stuff, like "location", "map", "route" and so on...
Just a guess but I think this should work like that

I don't have any idea that which programming language will you prefer to develop this project but I highly recommend Python. It is object-oriented, high level and extensive support libraries. Moreover, I don't know what is your targeted OS for this project, you unfortunately have not specified it and maybe you will choose Android OS(for mobile application you may use Python) so Python may not be a good option for you. But I supposed that you will develop a desktop application so I will propose my opinions under the motivation that you will choose Python to develop this project.
First of all, you can use speech recogition library for speech to text. After getting text from speech, we can jump to next step which is analyzing the questions.
At the present time, deep learning is the state of the art and Tensorflow is the great technology to take advantage of deep learning.
This is an amazing chatbot framework to build a conversational model for your custom chat bot. You should edit this JSON file to create your conversational intents, after the editing JSON file, you can analyze the questions of user (i.e. your program can understand what user said, the program will parse the question and get the location which is requested by user). When you get the location from question, the program can open browser (for example GoogleMaps) via executing an Ubuntu terminal command in Python.
As a summary, when user says "Way from New York City to Washington DC" the program will run respectively;
Get text from user speech
The program will analyze text via trained system and it can understand that what user said
The program can get the destination and current location which is specified by user (many kind of information can get from user request) by using the structure of this JSON[I recommend this way], or may be NLP or using any kind of string operations
The program will start to load Google maps URL via (for example) Ubuntu terminal command for these location information