Cannot load the csv file in Weka - database

I have tried many times to load this *.csv file, but I failed. I am using Weka 3.7
Here is the error:
Wrong number of mumber. Read 1,expected 12, read Token[EOL], line 2
This is the line 2 in my file:
7;0.27;0.36;20.7;0.045;45;170;1.001;3;0.45;8.8;6
I dont know what wrong with this?
Someone helps me. Thank you very much.

I had tried to import a semi-colon delimited file as a CSV for opening into Weka, but what appeared to happen was that the contents were loaded up as a single attribute (due to the lack of commas in the file structure). I didn't get the error that you had reported.
What you would need to do is replace all of the semi-colons to be commas and then load the contents again. In your above case, I assumed that the first line contained the attribute names, which loaded successfully in my test case.
As such, the format Weka is likely expecting is the Comma-Separated Values Format.

Related

"Too few data elements" error in Knime CSV Reader

Receiving below error while execution of CSV file which includes around 400k rows
Error:
ERROR CSV Reader 2:1 Execute failed: Too few data elements (line: 2 (Row0), source: 'file:/Users/shobha.dhingra/Desktop/SBC:Non%20SBC/SBC.csv')
I have tried executing another csv file with few lines, did not face an issue.
It is not about the number of lines, but the content in the line (2 in your case). It seems your SBC.csv file is not correct, it has extra header content or the second line misses the commas representing the missing cells.
You can use the CSV Reader node's Support Short Lines option to let KNIME handle this case by producing missing cells.
I get this error when end-of-line characters exist in a field. You could load the file into a text editor and identify any look for non-printing characters (tabs, carriage returns etc) between your delimiters.
If you can't get a clean version of the file, consider using this regex
[^ -~] to identify any character that is not a space or a visible character.
I hope this helps.

SQL Server: copy headers after bat to extract data

I have many files that are extracted into .txt with a batch file. But they don't have the headers. I've read that a possible solution from here that is to add to a .txt with the headers the exported rows.
With this:
echo. >> titles.txt
type data.txt >> titles.txt
This takes a lot of time and is not efficient, since it is adding the big file to the file with the text.
Another possible solution is to add to the SQL query the titles hardcoded, but this will change the type of the columns (is they are numeric they will be changed to varchar).
Is there a way to insert in the first row of the data txt the headers and not doing vice-versa?
I might be wrong, but as far as I am informed (and as far as I know from earlier experiments in doing as described): No, it is not possible! The mentioned Tasks are acting on the file sequentially. You can either open a file for reading, writing or appending. If you open the titles.txt file for writing, it is overwritten - and with this empty. If you open it for appending, it can only append to the end of the file - so you can only write the data after the Header... the only way it might work - but which is pretty nasty - is to append the title to the end of the file and during later processing (e.g. xls or whatever) Resort the rows and put the last one to the beginning. But as mentioned: nasty and not really the way to go.
If the number of files to process is a bigger problem than any individual file size, switching from bcp to sqlcmd might help.

SSIS truncation error only in control flow

I have a package that giving me a very confusing "Text was truncated or one or more characters had no match in the target code page" error but only when I run the full package in the control flow, not when I run just the task by itself.
The first task takes CSV files and combines them into one file. The next task reads the output of the previous file and begins to process the records. What is really odd is the truncation error is thrown in the flat file source in the 2nd step. This is the exact same flat file source which was the destination in the previous step.
If there was a truncation error wouldn't that be thrown by the previous step that tried to create the file? Since the 1st step created the file without truncation, why can't I just read that same file in the very next task?
Note - Only thing that makes this package different from the others I have worked on is I am dealing with special characters and using code page 65001 UTF-8 to capture the fields that have special characters. My other packages were all referencing flat file connection managers with code page 1252.
The problem was caused by the foreach loop and using the ColumnNamesInFirstDataRow expression where I have the formula "#[User::count_raw_input_rows] < 0". I have a variable initialized to -1 and I assign it to the ColumnNamesInFirstDataRow for the flat file. When in the loop I update the variable with a row counter on each read of a CSV file. This puts the header in the first time (-1) but then avoids repeating on all the other CSV files. When I exit the loop and try to read the input file it treats the header as data and blows up. I only avoided this on my last package because I didn't tighten the column definitions for the flat file like I did with this package. Thanks for the help.

identifying data file type

I have a huge 1.9 GB data file without extension I need to open and get some data from, the problem is this data file is extension-less and I need to know what extension it should be and what software I can open it with to view the data in a table.
here is the picture :
Its only 2 lines file, I already tried csv on excel but it did not work, any help ?
I have never use it but you could try this:
http://mark0.net/soft-tridnet-e.html
explained here:
http://www.labnol.org/software/unknown-file-extensions/20568/
The third "column" of that line looks 99% chance to be from php's print_r function (with newlines imploded to be able to stored on a single line).
There may not be a "format" or program to open it with if its just some app's custom debug/output log.
A quick google found a few programs to split large files into smaller units. THat may make it easier to load into something (may or may not be n++) for reading.
It shouldnt be too hard to mash out a script to read the lines and reconstitute the session "array" into a more readable format (read: vertical, not inline), but it would to be a one-off custom job, since noone other than the holder of your file would have a use for it.

Easy way to inspect BCP .dat file?

I'm getting BCP error "Unexpected EOF encountered in BCP data-file" during import, which is probably misleading. I strongly suspect that some field has been added to the table or there's some offending character is in the file.
How would I go about inspecting the contents of .dat file visually?
Are there any good hex viewers where I can quickly try to adjust row length to see the data in tabular manner?
Other suggestions are also appreciated.
I guess it depends on your input format. Is it binary input? if so, it's gonna be hard. I use visual studio to open a file in the binary viewer but it's far from easy. The usual suspects are CRLF's in a text field or text that contains your field delimiter or EOL character.

Resources