Flink batch ReadCSV - zip file

Flink batch ReadCSV - zip file - apache-flink

I am writing a batch based on
https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/dataset_java/mail_count/MailCount.java
In the following code, input has to be .csv, otherwise I get error. I tried a .zip file with a csv in it. In the MailCount.java, I see that the readCsvFile accepts .gz file as input and works fine. Could you please help?
env.readCsvFile(input)
.ignoreFirstLine()
.includeFields(fields)
.types(String.class,String.class);
Thanks
Aruna

Flink supports reading compressed files out of the box, if the files have a proper extension. However, not all types of compression are supported. You can find the list of supported compression types in [1].
For example, .gz is supported, that's why the example works, but .zip isn't, so you get an error.
Best regards,
Konstantin
[1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html#read-compressed-files

Related

Compress file to tgz with apache camel

I want to compress a file to ".tgz" extension using apache camel 3.11.1.
As the gzip format is not supported, I tried GZipDeflater data format but it is not working (the file remains unchanged).

I found a solution, maybe it can help someone else.
The solution was to use gzipDeflater DataFormat, and explicitly provide the extension in the file endpoint when producing the result:
from("file:...../in?noop=true")
.marshal().gzipDeflater()
.to("....../out1?fileName=${file:name.noext}.tgz");

ePub mimetype - anyone encountered a situation where it's needed?

I'm wondering if anyone has ever encountered a situation where it was necessary that the mimetype file be put into the zip file first (and uncompressed) to make the ePub work. And I'm not talking about the ability to pass an ePub validation.
I've been trying to write a script to create ePubs and it's not working. I tried several variations of the 7zip flags and every time the validation complains about the mimetype file.
I got fed up and just opened one of the files in Digital Editions and it worked fine. Then opened it in Calibre, dropped it into Chrome (ePub Reader), in iBooks, and even made a Kindle file. Everything worked as expected without throwing up an error.
Is there any situation where this matters...apart from the OCD part of me wanting an error-free file?

The answer appears to be "no"...
But here are a sampling of posts fighting with the Mimetype file error:
Make MIMETYPE file the first file in an EPUB ZIP file?
Creating an Epub file with a Zip library
creating *.epub with perl Archive::Zip -- epubchecker error
How to create ePub with System.IO.Compression.ZipArchive?
The mimetype file has an extra field of length n. The use of the extra field feature of the ZIP format is not permitted for the mimetype file

How to open files in compressed format. ZFC

How I can open files in compressed format. ZFC in windows?
The format is this
http://file.downloadatoz.com/zfc-file-extension/
I'm trying to unzip this file
http://fluxxy.com.br/rom.zfc
Thanks to anyone who can help me!

This file format seems to be a proprietary format.
It contains no header related to well known archive format so you won't be able to uncompress ti unless you find an RFC-alike documentation.
You won't get through it easilysince you'll need binary mapping from Alistair to do so.

Read omni.jar archive file with MiniZip library

Firefox store its default configuration information in omni.jar (older version) or omni.ja (later versions). Both omni.jar and omni.ja just are zip file format. So we can use many programs/libraries to compress or decompress them.
I want to get some default information of Firefox. So I must read some file in those omni file. I have used 7zip program to see the content of omni or MiniZip/Zlib library to read omni in my program.
With later version, omni.ja, it is read well. But with older versions, MiniZip cannot open the omni.jar file. Then I use 7zip to open those file: omni.ja was ok, omni.jar was fail. But with some other program, ex: WinRar, WinZip, ... both omni file is opened well.
I was google and get some information: with older version, Firefox has created the omni.jar file (a zip file format) with no zip standard. blah blah. But why WinRar or WinZip can read.
Anyone can help me to get MiniZip read omni.jar file with no error?
Thank you very much.

The solution is pretty simple: Your "old" omni.jar is broken. My omni.jar starts with PK.
I suggest to download Firefox from the official archive once more.
[EDIT] It seems that different builds of Firefox use different tools to built the ZIP archive. The en-US version uses a ZIP-like format which doesn't start with PK. While in theory the file format is valid (it contains data followed by the list of entries), almost no tool really supports this format (so WinZIP and WinRAR are the exceptions).
This intention is reflected by the rename of omni.jar to omni.ja: It's not encrypted, it's just a ZIP format that isn't widely supported and the US Firefox developers don't want to change this.
The other developers (for example for the de version) use official tools to build the omni archive so those versions can be modified with any tool.
You will need to find a way to update the archive using WinZIP / WinRAR or you need to download the original firefox sources and add your patches to the build process.

File extension .DB - What kind of database is it exactly?

I have a database file with .DB file extension. I have been googling and it looks like SQLite. I tried to connect to it using SQLite and SQLite3 drivers and I am getting an error "File is encrypted or not a database".
So I dont know if file is encrypted or it is not an SQLite database. Are there any other options what should the .DB extension should be? How do I find out that file is encrypted?
I tried to open it in the text editor and it is mostly a mess of charaters and some times there are words visible. I have uploaded the file here: http://cl.ly/3k0E01373r3v182a3p1o for the closer look.
Thank you for your hints and ideas what to do and how to work with this file.

Marco Pontello's TrID is a great way to determine the type of any file.
TrID is simple to use. Just run TrID and point it to the file to be analyzed. The file will be read and compared with the definitions in the database. Results are presented in order of highest probability.
Just download the executable and the latest definitions file into the same directory and then run TrID:
trid.exe "path/to/file.xyz"
It will output a list of possible file types for the file with a confidence rating. Here's a screenshot of using TrID to analyze a SQLite database file:
There's also a GUI version called TrIDNet:

If you're on a Unix-like platform (Mac OS X, Linux, etc), you could try running file myfile.db to see if that can figure out what type of file it is. The file utility will inspect the beginning of the file, looking for any clues like magic numbers, headers, and so on to determine the type of the file.

Look at the first 30 bytes of the file (open it in Notepad, Notepad++ or another simple text viewer). There's usually some kind of tag or extension name in there.
Both SQLite 2 and SQLite 3 have a very clear message: SQLite format 3 for SQLite 3 (obviously) and This file contains an SQLite 2.1 database for SQLite 2.
Note that encrypted SQLite databases don't have a header like that since the entire file is encrypted. See siyw's comment below.

On a Unix-like system (or Cygwin under Windows), the strings utility will search a file for strings, and print them to stdout. Might help you narrow the field.
There are a lot of programs besides database programs that use a "db" extension, including
ArcView Object Database File (ESRI)
MultiEdit
Netscape
Palm
and so on. Google "file extensions" for some sites that catalog file extensions and the programs that use them.

There's no conclusive way to know, because SQLite encrypts the entire database file, including the header.
Further, there's not a lot of difference to you, except for possible error text to a user if you're prompting them for a password.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Flink batch ReadCSV - zip file - apache-flink

Related

Compress file to tgz with apache camel

ePub mimetype - anyone encountered a situation where it's needed?

How to open files in compressed format. ZFC

Read omni.jar archive file with MiniZip library

File extension .DB - What kind of database is it exactly?

Categories

Resources