How to read LZO compressed HDFS files in Giraph

How to read LZO compressed HDFS files in Giraph - giraph

I am looking for an input format for Giraph which can read LZO compressed files. It appears the current input format GiraphRunner uses by default is BspInputFormat which has no mention of LZO compression. Is this simply an oversight and I'll have to implement my own LzoBspInputFormat class? It feels like something someone has already done previously.

It appears I asked this before actually even trying. I am loading LZO compressed files into my Giraph job via the GiraphRunner and it is accepting the input fine. This was a complete non-issue.

Related

Snowflake PUT command, AUTO_COMPRESS vs gzip compressed file performance

Can someone suggest which of below option will be more performant with PUT command:
Uploading file with AUTO_COMPRESS=true.
Uploading compressed file(gzip) AUTO_COMPRESS=false.

There's no harm to leaving AUTO_COMPRESS=true because if a file is already compressed, the PUT command won't try to double compress it. There is an important caveat to note though. If a file is already compressed, it must be compressed in a supported compression method. You can get a list of supported methods here: https://docs.snowflake.com/en/sql-reference/sql/put.html
Using compression either before or auto_compress is advisable since it will reduce network transfer times and bandwidth consumption. This will use CPU and IO on the server doing the PUT operation. If the server doing the PUT is maxed out (I've seen some cases of VMs on oversubscribed systems for example), it would be better to perform the compression before sending to the machine doing the PUT. This is because there's already a lot of CPU and IO on the PUT operation because it's encrypting the files prior to upload.

Legacy dos system with flat file data store (ISAM-Files)

I have a legacy system which used to run on dos. It is an ERP system for retail stores (fashion). It think it stores it's data in flat files.
I have files ending with *.KEY and other files ending with *.D00 (counting up).
I think the key files hold the key informationen and the D-Files hold some data ... there are alot D77 files...
As far as my investigation concerns this is not dfb or foxpro it could proprietary...
The company who wrote it is out of business of course so no chance for support or any hints.
When I open these files in vim or other editors I get some binary signs and some text... I tryed it in hex mode but still nothing to use...
Is there any chance I can dump out the data... in csv, ascii, xml?
I am pretty sure that this is not a standard format. Can someone point me in a direction how those data were stored back in the days and how could I make them read-able...
Any tools, tips or tricks?
// EDIT
After some time I made some progress and can now post some details which I did not now of back then and made a good answer impossible.
I asume that the dos system was written in visual cobol and that the files could be b-tree files stored in ISAM format. I assume the closet thing I could provide is, that there is a possibility that the format is C-ISAM.
How can I access / view or modify these files... C#, JAVA, ruby.... everything new age language would be cool... I am not sure if I can handle cobol... It would be great to have a converter or a viewer tool preferable opensource...
Hope this clearifies more my question =)

OpenCOBOL has a very active user group. The language itself is free and runs on Linux and Windows and perhaps MacOSX. Have a chat to the user group there; they may be able to help.

Peachtree Accounting Software used those file extensions back in 1992.

What is tsr file

i have a requirement to export data to tsr file only. Previously we used to export to csv file.
Can anyone tell me what is tsr file and what is the content format? Is the contents in someway similar with csv?

Google does not come up with something convincing. I suggest you ask your client to explain the requirement in more detail. TSR does not seem to be a generally understood format.
Maybe TSV (tab-separated values, VERY similar to CSV)?

TSR are shared repository file. Basically you can see this type of file in HP UFT. Whenever you create object in UFT it goes into repository with extension .tsr.

See this: http://www.computerfileextensions.com/file-extensions.php/TSR
But the results are not so clear/convincing.
Can not you ask for a sample tsr file from your client?

Another result for related to TSR files, but it's probably another format the OP was looking for.
The TSR file extension is related to TSR Launcher, a fan-made program for The Sims 3 computer game series that allows users to quickly install Sims3Pack files without using the Launcher provided by EA. It looks like the program is no longer available for download.
A *.tsr file is used for download basket function. Most likely some sort of container.
http://www.file-extensions.org/tsr-file-extension

Read and write directly from and to compressed files in C

in Java I think it is possible to cruise through jar files like they were not compressed. Is there some similar (and portable) thing in C/C++ ?
I would like to import binary data into memory from a large (zipped or similar) file without decompressing to disk first and afterwards writing to disk in a compressed way.
Maybe some trick with shell pipes and the zip utility?

I think you want zlib:
http://www.zlib.net/

Reading ID3 tags of a remote mp3 file?

Read MP3 Tags with Silverlight got me started with reading id3 tags, but i realize that taglib# online deals with local file paths ?
Is there a way of reading this info from a remote file ?

I recently answered the same question for Ruby (see below) - I'm pretty sure you can do something similar.
The idea is:
use HTTP 1.1 protocol or higher, and a Range HTTP-request.
download the beginning section (100 bytes) of the ID3v2-tag
from the first few bytes downloaded, you can determine the correct length of the complete ID3v2 tag, e.g. N
download the first N bytes of the file (e.g. the complete ID3v2-tag)
parse the ID3v2 tag for your purposes
See:
Read ID3 Tags of Remote MP3 File in Ruby/Rails?

Tim Heuer has a good blog post on doing this. http://timheuer.com/blog/archive/2010/01/30/reading-mp3-id3-tags-with-silverlight-taglib.aspx
Like yourself, he also ran into the problem of TabLib# only using local paths.
One thing that TagLib# didn’t have was a stream input implementation. Most of the libraries, in fact, assumed a local file path. Luckily the library was written using a generic ‘File’ interface, so I just had to create my own StreamFileAbstraction. I chose to do this within my project rather than the base library. It was easy since the LocalFileAbstraction actually perfomed an Open on the file as it’s first task and set some public variables. My abstraction basically just hands the stream already and ready to go.
There is an example on the novell site that uses file abstraction.
http:// developer.novell.com/wiki/index.php/TagLib_Sharp:_Examples

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight