"Too few data elements" error in Knime CSV Reader

"Too few data elements" error in Knime CSV Reader - file

Receiving below error while execution of CSV file which includes around 400k rows
Error:
ERROR CSV Reader 2:1 Execute failed: Too few data elements (line: 2 (Row0), source: 'file:/Users/shobha.dhingra/Desktop/SBC:Non%20SBC/SBC.csv')
I have tried executing another csv file with few lines, did not face an issue.

It is not about the number of lines, but the content in the line (2 in your case). It seems your SBC.csv file is not correct, it has extra header content or the second line misses the commas representing the missing cells.
You can use the CSV Reader node's Support Short Lines option to let KNIME handle this case by producing missing cells.

I get this error when end-of-line characters exist in a field. You could load the file into a text editor and identify any look for non-printing characters (tabs, carriage returns etc) between your delimiters.
If you can't get a clean version of the file, consider using this regex
[^ -~] to identify any character that is not a space or a visible character.
I hope this helps.

Related

SSIS : Unwanted line return on a dynamic connection string

In a SSIS package, I want to send data from several instance to a flat files. To do so I create a dynamic connection string made of 3 variables:
".txt"
a Network path
The file name (which is the instance Name variable (string) that i use elsewhere in my package)
When i evaluate my expression at this point i receive :
For
TRIM(#[User::FileName]+REPLACE(#[User::ServerName],"\\","")+#[User::ExtensionFile])
I receive
\\test-01\TEMP\SQL01MyInstance.txt
But, when i run the job, it's unable to create the SQL01MyInstance.txt, and i receive as error :
[Flat File Destination [11]] Error: Cannot open the datafile "\\test-01\TEMP\SQL01MyInstance
.txt".
[SSIS.Pipeline] Error: Flat File Destination failed the pre-execute phase and returned error code 0xC020200E.
There's a unwanted space at the end filename, when i copy paste the error message elsewhere it appear to be a line return (before the .txt)
Does anybody know how can i get rid of it that line return (which i'm assuming is making the job fail) ?
Edit 1:
Rights on the destination folder are ok, because there's another flat file that I create in case of errors and it's created normally after that failure; but not with a dynamic name (normal behavior)

To remove line return you can use REPLACE() function with \r\n
REPLACE(REPLACE(TRIM(#[User::FileName]+REPLACE(#[User::ServerName],"\\","")+#[User::ExtensionFile]),"\r",""),"\n","")
Where
\r : carriadge return
\n : line feed

The TRIM function only trims the space character (versus other functions which trim all white space):
TRIM does not remove white-space characters such as the tab or line feed characters. Unicode provides code points for many different types of spaces, but this function recognizes only the Unicode code point 0x0020. When double-byte character set (DBCS) strings are converted to Unicode they may include space characters other than 0x0020 and the function cannot remove such spaces. To remove all kinds of spaces, you can use the Microsoft Visual Basic .NET Trim method in a script run from the Script component.
https://learn.microsoft.com/en-us/sql/integration-services/expressions/trim-ssis-expression
You can try this first to see if it works (Trim first then concatenate):
TRIM(#[User::FileName]) + TRIM(REPLACE(#[User::ServerName],"\","")) + TRIM(#[User::ExtensionFile]))
If not then you'll have to do the recommended String.Trim() function using a Script Task/Component that the MSDN article recommends (again, Trim each variable first, then concatenate)

SSIS truncation error only in control flow

I have a package that giving me a very confusing "Text was truncated or one or more characters had no match in the target code page" error but only when I run the full package in the control flow, not when I run just the task by itself.
The first task takes CSV files and combines them into one file. The next task reads the output of the previous file and begins to process the records. What is really odd is the truncation error is thrown in the flat file source in the 2nd step. This is the exact same flat file source which was the destination in the previous step.
If there was a truncation error wouldn't that be thrown by the previous step that tried to create the file? Since the 1st step created the file without truncation, why can't I just read that same file in the very next task?
Note - Only thing that makes this package different from the others I have worked on is I am dealing with special characters and using code page 65001 UTF-8 to capture the fields that have special characters. My other packages were all referencing flat file connection managers with code page 1252.

The problem was caused by the foreach loop and using the ColumnNamesInFirstDataRow expression where I have the formula "#[User::count_raw_input_rows] < 0". I have a variable initialized to -1 and I assign it to the ColumnNamesInFirstDataRow for the flat file. When in the loop I update the variable with a row counter on each read of a CSV file. This puts the header in the first time (-1) but then avoids repeating on all the other CSV files. When I exit the loop and try to read the input file it treats the header as data and blows up. I only avoided this on my last package because I didn't tighten the column definitions for the flat file like I did with this package. Thanks for the help.

Cannot load the csv file in Weka

I have tried many times to load this *.csv file, but I failed. I am using Weka 3.7
Here is the error:
Wrong number of mumber. Read 1,expected 12, read Token[EOL], line 2
This is the line 2 in my file:
7;0.27;0.36;20.7;0.045;45;170;1.001;3;0.45;8.8;6
I dont know what wrong with this?
Someone helps me. Thank you very much.

I had tried to import a semi-colon delimited file as a CSV for opening into Weka, but what appeared to happen was that the contents were loaded up as a single attribute (due to the lack of commas in the file structure). I didn't get the error that you had reported.
What you would need to do is replace all of the semi-colons to be commas and then load the contents again. In your above case, I assumed that the first line contained the attribute names, which loaded successfully in my test case.
As such, the format Weka is likely expecting is the Comma-Separated Values Format.

Which is the best character to use as a delimiter for ETL?

I recently unloaded a customer table from an Informix DB and several rows were rejected because the customer name column contained non-escaped vertical bars (pipe symbol) characters, which is the default DBDELIMITER in the source db. I found out that the field in their customer form has an input mask allowing any alphanumeric character to be entered, which can include any letters, numbers or symbols. So I persuaded the user to run a blanket update on that column to change the pipe symbol to a semicolon. I also discovered other rows containing asterisks and commas in different columns. I could imagine what would happen if this table were to be unloaded in csv format or what damage the asterisks could do!
What is the best character to define as a delimiter?
If tables are already tainted with pipes, commas, asterisks, tabs, backslashes, etc., what's the best way to clean them up?

I have to deal with large volumes of narrative data at my job. This is always a nightmare because users are apt to put ANY character in there, including unprintable characters. You can run a cleanup operation, but you have to do it every time you load data, and it likely won't work forever. Eventually someone will put in what every character you choose as a separator, which is not a problem if your CSV handling libraries can handle escaping properly, but many can't. If this is a one time load/unload, you're probably fine, but if you have to do it more often....
In the past I've changed the separator to the back-tick '`', the tilde '~', or the caret '^'. All failed in the current effort. The best solution I could come up with is to not use CSV format at all. I switched to XML. Even so there were still XML illegal characters, but these can be translated out with atlassian-xml-cleaner-0.1.jar.

Unload customer table with default pipe; string search for a character that doesn't exist. ie. "~"
unload to file delimiter "~"
select * from customer;
Clean your file (or not)
(vi replace string):g/theoldstring/s//thenewstring/g)
or
(unix prompt) sed 's/old-char/new-char/g' fileold > filenew
(Once clean id personally change back "~" in unload file to "|" or "," as csv standard)
Load to source db.

If you can, use a multi-character delimiter. It can still fail, but it should be much more highly unlikely.
Or, escape the delimiter while writing the export file (Informix docs say "LOAD TABLE" escapes by prefixing delimiter characters with backslash). Proper CSV has quoting and escaping so it shouldn't matter if a comma is in the data, unless your exporter and loader cannot handle proper CSV.

Bulk insert fixed length file with ragged right lines

I have a fixed length text file, except some of the lines end early, with a carriage return/line feed. I'm using a .fmt file.
Q: How do I tell SQL Server to use an empty string for the fields that are unaccounted for?
I should probably ask my client to pad his text file, but it would be easier to just process it with the lines that are terminated early.

You should write a pre-processor to condition the text file before doing the bulk insert.

Categories

distributed-transactions

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

"Too few data elements" error in Knime CSV Reader - file

Related

SSIS : Unwanted line return on a dynamic connection string

SSIS truncation error only in control flow

Cannot load the csv file in Weka

Which is the best character to use as a delimiter for ETL?

Bulk insert fixed length file with ragged right lines

Categories

Resources