How to set up a suctom dataset to train on whisper - dataset

How can I set tup a dataset to train on whisper model with custom data, I have transcripctions on excel file and audios in another folder, all the examples call a dataset from huggingFace

Related

Complex Transfer of Data from Email Attachment to Template

I have a complicated project I am undertaking for my work, and any advice you could give would be really appreciated.
Basically here is what I want to do:
Filter Emails from specific people
Download those emails' attachments (they are excel format) to a specific folder in Drive (called "Input")
Pull the data from those spreadsheets to specific cells of another spreadsheet (master template) I have in another part of Drive (probably over a hundred cell value transfers)
Automatically delete the downloaded attachments in the drive to prepare for the next (same name) files that will be downloaded tomorrow.
Once the master is filled up, make a copy of the entire spreadsheet, rename it to today's date, and then wipe the master to rinse and repeat the same process tomorrow.
So right now I am just working with this conceptually, here is a basic blueprint of what I am thinking about doing but your input would be much appreciated:
Filter emails using Gmail's Label system
Use this method to download emails: https://www.splitbrain.org/blog/2017-01/30-save_gmail_attachments_to_google_drive
Create a permanent spreadsheet in the "Input" folder (called "Master Array") to array all the data from the downloaded spreadsheets and their respective sub-sheets. The goal here being to have one constant File_ID housing all the data.
Create a search if array function in the Master Array which will search for the correct files by their respective name and array their data in the correct subsheets in the Master Array spreadsheet (i.e. If file name contains "Company Sales" array data in "Sales" subsheet).
ImportRange, Query, Vlookup, etc. that "Master Array" spreadsheet and pull all the values I need out of there to the respective cells in the Master Template they need to be.
Once the Master Template is built, I want to copy the entire spreadsheet, rename it to today's date, and then wipe the original (preparing it for the same function tomorrow). (using google timer trigger)
Delete all the downloaded email attachments in the drive folder "Input" to rinse and repeat for the same function the next day. (using google timer trigger).
Some questions I have:
Is there a more efficient way to do this?
What is the best way to copy data from one spreadsheet to another, would it be quicker in script, or as a import function within each individual cell of the master template?
Can I use a loop/if function to pull certain cells to certain sheets within the Master Template, basically having functions for each sheet name, so say IF sheetname="Sales" pull cells A2 from the other spreadsheet to b3... etc.
Sorry this is very long and robust, just wanted to see if this is possible to do comprehensively or not. Thank you for any and all input, I am relatively new to Sheets so forgive my naivety.
1) You can use a search query when retrieving the messages list [1][2].
2) You could develop your own code to get the messages list and get the attachments for each email using GmailApp class [3].
3, 4, 5) You'd first need to convert the excel to Google Sheets [4], to be able to manage the Spreadsheets easily with SpreadsheetApp class [5].
6) You can copy the Spreadsheet like this [6] and get the entire range of data to clear it [7].
7) Use this [8].
[1] https://developers.google.com/gmail/api/guides/filtering
[2] https://support.google.com/mail/answer/7190
[3] https://developers.google.com/apps-script/reference/gmail/gmail-app
[4] Converting .xls to google spreadsheet in google apps script
[5] https://developers.google.com/apps-script/reference/spreadsheet/spreadsheet-app
[6] Google Apps Script, copy one spreadsheet to another spreadsheet with formatting
[7] https://developers.google.com/apps-script/reference/spreadsheet/range#clear()
[8] https://developers.google.com/apps-script/reference/drive/drive-app#removeFile(File)

How can i write XML into Silverlight by Master Details Meta Data Object

I am being astonished that there is no new post about silverliht.It will be very harmful to our company because my company still now in silverlight and they have not finished their ERP and they have no chance to change silverlight to other technology at this moment.that for causes we have to study in silverlight.Ok.
My questions:
I want to convert XML by my master details object. suppose i have a master like table_Personal and details table JOB INFO.so how can i write into single xml file at a time.
Thanks.
There are several ways of doing this, but you say you have two tables. You can setup a relationship between these Objects in a DataSet, and then you can easily generate XML from this. DataSet.WriteXML doesn't seem to exist in Silverlight, but the post Convert Dataset to XML is a post on how to implement this.
To find the DataSet in Silverlight you should put using System.Data; in top of the .cs file. Example of DataSet code:
System.Data.DataSet myDataSet = new System.Data.DataSet();
myDataSet.Tables.Add(...);
myDataSet.Tables.Add(...);
myDataSet.Relations.Add(new System.Data.DataRelation(....));

Fast solution to fill create and fill db table with data from csv file

Like in title.. I want to create table in my db and then fill it with records from my csv(tab is ";") file. There records (and whole table) won't be updating or changing by anyone. I just have to create db(only Postgres) with these records. I want later take randomly from this table some records. Any tips? Or maybe taking random records directly from csv file instad from db is better solution? Plz help
You can make a command line tool that reads your cvs file and use ActiveModel to talk to the database. In this case you create your Model and Database as you would do in any other rails Application.
To read your cvs file you culd do this:
require 'csv'
CSV.foreach(fileName, :headers => true, encoding: "UTF-8") do |row|
row['myfiled'].to_s
end
You need the csv gem and you have to replace myfield with the name of the column you want to use. If you have the data in ruby you create your models as if you would to in rails. Put the definitions in the same file or use require to reference your models.
Alternatively you could just read the cvs file and randomly decide if you want to use the data. In this case you do not need to put it in the database.

Is it possible to get the source of the document?

I'm implementing my very first Solr based search application. The application is currently using a database server and local files (txt, xml) as data sources.
I was wondering if it's possible to display the source of a document on display. Is it possible to say for example: Result1 from 1.txt, Result2 from database ... etc ...?
You can create extra field, for example source, which could hold that information. This field can be populated straight from data importer making it pretty straight forward.

SQL2008 Integration Services - Loading CSV files with varying file schema

I'm using SQL2008 to load sensor data in a table with Integration Services. I have to deal with hundreds of files. The problem is that the CSV files all have slightly different schemas. Each file can have a maximum of 20 data fields. All data files have these fields in common. Some files have all the fields others have some of the fields. In addition, the order of the fields can vary.
Here’s and example of what the file schemas look like.
Station Name,Station ID,LOCAL_DATE,T_1,TD_1,RH_1,CL_1,RS_1,RI_1,PR_1,RD_1,SH_1,CL_2
Station Name,Station ID,LOCAL_DATE,T_1,TD_1,RH_1,CL_1,RS_1,RI_1,PR_1,WS_1,WD_1,WSM_1,WDM_1,SH_1
Station Name,Station ID,LOCAL_DATE,T_1,TD_1,RH_1,RS_1,RI_1,PR_1,RD_1,WS_1,WD_1,WSM_1,WDM_1
Station Name,Station ID,LOCAL_DATE,T_1,RH_1,RS_1,PR_1,VI_1,PW_1,WS_1,WD_1,WSM_1
Station Name,Station ID,LOCAL_DATE,T_1,RH_1,RS_1,WS_1,WD_1,WSM_1
Station Name,Station ID,LOCAL_DATE,T_1,RH_1,RS_1,PR_1,VI_1,WS_1,WD_1,WSM_1
I’m using a Data Flow Script Task to process the data via CreateNewOutputRows() and MyOutputBuffer.AddRow(). I have a working package to load the data however it’s not reliable and robust because as I had more files the package fails because the file schema has not been defined in CreateNewOutputRows().
I'm looking for a dynamic solution that can cope with the variation in the file schema. Doeas anyone have any ideas?
Who controls the data model for the output of the sensors? If it's not you, do they know what they are doing? If they create new and inconsistent models every time they invent a new sensor, you are pretty much up the creek.
If you can influence or control the evolution of the schemas for CSV files, try to come up with a top level data architecture. In the bad old days before there were databases, files made up of records often had, as the first field of each record, a "record type". CSV files could be organized the same way. The first field of every record could indicate what type of record you are dealing with. When you get an unknown type, put it in the "bad input file" until you can maintain your software.
If that isn't dynamic enough for you, you may have to consider artificial intelligence, or looking for a different job.
Maybe the cmd command is good. in the cmd, you can use sqlserver import csv.
If the CSV files that all have identical formats use the same file name convention or if they can be separated out in some fashion you can use the ForEach Loop Container for each file schema type.
Possible way to separate out the CSV files is run a Script (in VB) in SSIS that reads the first row of the CSV file and checks for the differing types (if the column names are in the first row) and then moves the files to the appropriate folder for use in the ForEach Loop Container.

Resources