I already know how to upload a .csv file to Wave analytic. But I can't do this with multi file (2 file).
I always got this error
(column: Column01) strconv.ParseFloat: parsing "\ufeff0464589401": invalid syntax
The first line of the second file :
0464589401
I have no idea
Related
i m using java zlib package for extracting txt from a pdf file. But when i input first compressed stream found in this file to inflate(), it returns z_need_dict error. On getting this error i tried giving an arbitrary dictionary array to set_inflate_dictionary() followed by another inflate() call but same error appears "dictionary needed".I hv found in zlib manual that decompression application should provide the same dictionary that was used for compressing data. How can one know exactly what dictionary bytes were used while formulating this pdf file by the author? Or can one extract the same dictionary from pdf file?
You did not correctly locate or extract the zlib stream. PDF does not use zlib streams that require dictionaries.
We have started a project to create a Turkish speech recognition dataset to use with DeepSpeech.
We finished preprocessing task of Ebook.
But we couldn't finish the forced alignment process with Aeneas.
According to its tutorials for forced alignment, you need a text file and its recorded audio file. While preprocessing of Ebook we have created 430 text files which are edited and cleaned for aeneas format (divided into paragraphs and sentences using nltk library).
But, while processing our created task object and creating its output file (Json file), we couldn't merge output files. For every Aeneas file, it starts from the beginning of the audio file.
It seems we need to split our audio file to 430 parts, but it is not a easy process.
I tried to merge Json files with:
import json
import glob
result = []
for f in glob.glob("*.json"):
with open(f, "rb") as infile:
result.append(json.load(infile))
with open("merged_file.json", "w") as outfile:
json.dump(result, outfile)
But it didn't work, because while forced alignment process, aeneas starting from the beginning of the audio file for each aeneas text files.
Is it possible to create a task object which includes all 430 aeneas text files and append them into one output file (Json file) with respect to their timings ( their seconds ) also using one audio file?
Our task object:
# create Task object
config_string = "task_language=tur|is_text_type=plain|os_task_file_format=json"
task = Task(config_string=config_string)
task.audio_file_path_absolute = "/content/gdrive/My Drive/TASR/kitaplar/nutuk/Nutuk_sesli.mp3"
task.text_file_path_absolute = "/content/gdrive/My Drive/TASR/kitaplar/nutuk/nutuk_aeneas_data_1.txt")
task.sync_map_file_path_absolute = "/content/gdrive/My Drive/TASR/kitaplar/nutuk/syncmap.json")
Btw, we are working on Google Colab with python 3.
I figured out to solve my question, and found a solution.
Instead of combining JSON files, I could combine aeneas text files with this code:
with open("/content/gdrive/My Drive/TASR/kitaplar/{0}/{1}/{2}_aeneas_data_all.txt".format(book_name,chapter,
book_name), "wb") as outfile:
for i in range(1,count-1):
file_name = "/content/gdrive/My Drive/TASR/kitaplar/{0}/{1}/{2}_aeneas_data_{3}.txt".format(book_name, chapter, book_name, str(i))
#print(file_name)
with open(file_name, "rb") as infile:
outfile.write(infile.read())
So after combining aeneas files, I can create a json file which contains all paragraphs.
How Application will detect file extension?
I knew that every file has header that contains all the information related to that file.
My question is how application will use that header to detect that file?
Every file in file system associated some metadata with it for example, if i changed audio file's extension from .mp3 to .txt and then I opened that file with VLC but still VLC is able to play that file.
I found out that every file has header section which contains all the information related to that file.
I want to know how can I access that header?
Just to give you some more details:
A file extension is basically a way to indicate the format of the data (for example, TIFF image files have a format specification).
This way an application can check if the file it handles is of the right format.
Some applications don't check (or accept wrong) file formats and just tries to use them as the format it needs. So for your .mp3 file, the data in this file is not changed when you simply change the extension to .txt.
When VLC reads the .txt byte by byte and interprets it as a .mp3 it can just extract the correct music data from that file.
Now some files include a header for extra validation of what kind of format the data inside the file is. For example a unicode text file (should) include a BOM to indicate how the data in the file needs to be handled. This way an application can check whether the header tag matches the expected header and so it knows for sure that your '.txt` file actually contains data in the 'mp3' format.
Now there are quite some applications to read those header tags, but they are often specific for each format. This TIFF Tag Viewer for example (I used it in the past to check the header tags from my TIFF files).
So or you could just open your file with some kind of hex viewer and then look at the format specifications what every bytes means, or you search Google for a header viewer for the format you want to see them.
I have a package that giving me a very confusing "Text was truncated or one or more characters had no match in the target code page" error but only when I run the full package in the control flow, not when I run just the task by itself.
The first task takes CSV files and combines them into one file. The next task reads the output of the previous file and begins to process the records. What is really odd is the truncation error is thrown in the flat file source in the 2nd step. This is the exact same flat file source which was the destination in the previous step.
If there was a truncation error wouldn't that be thrown by the previous step that tried to create the file? Since the 1st step created the file without truncation, why can't I just read that same file in the very next task?
Note - Only thing that makes this package different from the others I have worked on is I am dealing with special characters and using code page 65001 UTF-8 to capture the fields that have special characters. My other packages were all referencing flat file connection managers with code page 1252.
The problem was caused by the foreach loop and using the ColumnNamesInFirstDataRow expression where I have the formula "#[User::count_raw_input_rows] < 0". I have a variable initialized to -1 and I assign it to the ColumnNamesInFirstDataRow for the flat file. When in the loop I update the variable with a row counter on each read of a CSV file. This puts the header in the first time (-1) but then avoids repeating on all the other CSV files. When I exit the loop and try to read the input file it treats the header as data and blows up. I only avoided this on my last package because I didn't tighten the column definitions for the flat file like I did with this package. Thanks for the help.
I have a problem in my grails application which reads a txt file stored in the disk and then sends the file to the client.
Now I am achieving this by reading the file line by line and storing them in a String array.
After reading all lines from the file, the String array is sent to the client as JSON.
In my gsp's javascript I get that array and display the array contents in a text area as
textarea.value = arr.join("\n\n");
This operation happens recursively for every 1 minute which is achieved using ajax.
My problem is, the txt which the server is reading consists of about 10,000 to 20,000 lines.
So reading all those 10,000+ lines and sending them as array creates problem in my IE8 which gets hung-up and finally crashes.
Is there any other easy way of sending the whole file through http and displaying it in browser?
Any help would be greatly appreciated.
Thanks in advance.
EDIT:
On Googling I found that, file input/output streaming is a better way to display the file contents in a browser but I couldn't find an example on how to do it.
Can anyone share some example on how to do it?