I have a requirement where a .DAT file with data delimited with a HEX code needs to be read in talend.
Below is the sample data -
I have tried tfileinputdelimited, tfileinputpositional, tfileinputraw but nothing worked as expected. The delimiter is a HEX 00x7 or BEL - click here
How can I read this kind of file in talend ?
Your help is appreciated. Thank you in advance.
You can do it by using the utf-8 representation of the character in the field separator: \u0007
Related
Hi I am trying to read a csv file where the floating numbers have comma as decimal separator. When I read the file with Spark, it will just ignore the comma sign and concatenate everything together and I end up with something like:
77563215,23 becomes 7756321523.00.
How can I make sure that while reading it through spark, I get the same number as my original csv file?
Could you please read CSV file using option("locale", "de-DE")
We are trying to create a file format using the not character ¬ as the delimiter. We can't get Snowflake to work with a file delimited in this format. The documentation says multibyte delimiters are now supported. We've tried:
Just typing ¬ in the file format dialog
The hex code (permutations of 0xC2AC , 0xC20xAC etc)
The octal code 302 254 entered as permutations of \302254 etc
But whatever we try we get errors. Typing the delimiter straight it seems to think 0xC2 is the delimited and it gets confused by the second byte (0xAC). Using hex code or octal code gives an error about wrong number of columns. Any advice please?
Answer from Sergiu works perfectly:
For octal format use \302\254
I have a text file in Unix with two columns that contains strings in various languages (Chines, Korean, Japaneese, Arabic, English, French, German, Etc...) in the first column.
Current file's encoding is:
> file index.txt
index.txt: Non-ISO extended-ASCII English text, with LF, NEL line
terminators
I've been told that this file have a subset of entries (in column 1) that using a non-ASCII, non-UTF8 encoding and that I should switching data in this column preferably to ASCII. If not possible, to UTF8.
For example:
1. How user see it: 'Bibliothe<C3>que'.
2. Via vim: 'Bibliothèque'.
3. Via less: 'Bibliothèque'.
I already tried many conversion and method (for days) but non of them convert it as expected.
E.g
I tried to change the encoding to UTF8:
iconv -f CP1256 -t UTF-8 < index.txt > index.txt.2
770> file index.txt.2
index.txt.2: UTF-8 Unicode English text But the characters
seems to be corrupted in the new file.
But got: 1. Via vim: 'Bibliothﺃ¨que' 2. Via less: 'Bibliothأ¨que'.
I check how many non-ASCii rows this file contains and got output file with hundreds lines in the file 'index.txt.non_ascii':
pcregrep --color='auto' -n "[\x80-\xFF]" index.txt > index.txt.non_ascii
I also tried to write a short script (in Perl) that read the data and store it as utf8, but strings where corrupted again.
I will really appreciate if someone could assist me with this problem.
Thanks in advance!
Mike
I have database records available in MSExcel file. I save it as CSV file. And then create database in firefox's SQLiteManager by importing that CSV file .
But the characters like ..., ' , ",- are converted in �.
I have also tried to save CSV file in UTF-8 formate, but it converts that characters in Õ
Has anyone idea , how to solve it?
Thanks.
Perhaps you might want to consider escaping quotes, e.g. try "" or "' in your csv file. And just pay a bit more attention to Fields enclosed by section in SQLiteManager add-on, making sure these fields are enclosed properly.
I'm getting BCP error "Unexpected EOF encountered in BCP data-file" during import, which is probably misleading. I strongly suspect that some field has been added to the table or there's some offending character is in the file.
How would I go about inspecting the contents of .dat file visually?
Are there any good hex viewers where I can quickly try to adjust row length to see the data in tabular manner?
Other suggestions are also appreciated.
I guess it depends on your input format. Is it binary input? if so, it's gonna be hard. I use visual studio to open a file in the binary viewer but it's far from easy. The usual suspects are CRLF's in a text field or text that contains your field delimiter or EOL character.