I am trying to investigate the working of RC file and hence stored the file in hadoop cluster using row group size as 3 bytes to ensure my data is stored in 2-3 rowgroups.
After loading, inorder to check how the contents are organized in my file, I downloaded the file to be in RC file format and used xxd /Path/To/Downloaded/File to open it. The content which was in hexadecimal format is displayed but I hope there was some other format too in the same file due to which we are not able to check content.
The file in text and binary format opened using xxd is as follows;
Could someone help me understanding the contents of file in RC format.
Thanks,
Sree
There is hive utility rcfilecat to read RC file. Something like:
ggk#hadoop4:~/Downloads$ hive --rcfilecat 000000_0
References:
Documentation
Java doc
I wanted to see the file content as is. rcfilecat deserializes the data and rearranges in record format. I used the file to see contents.
sudo xxd /path/to/downloaded/file
Thanks,
Sree
Related
I have some binary blobs (blob from MySQL). These are suppose to be an audio stream recorded using JS web-app.
I took one of these blobs and save it as a.wtf file. When I ran strings a.wtf, I get some useful information.
webmB
QTmuxingAppLibWebM-0.0.1WA
QTwritingAppLibWebM-0.0.1
A_OPUSc
OpusHead
OPUS
...
I also tried the following in the terminal (tips on google).
[dilawars#chutki data (master)]$ mkvextract a.wtf tracks 0:audio.opus
Error: (mkvextract) The file 'a.wtf' could not be opened for reading: Not a valid Matroska file (no segment/level 0 element found).
Download a.wtf.
Any help is very much appreciated? Ideally, I'd like to convert them to WAV format.
Update
I used this tool.
[dilawars#chutki data (master)]$ hachoir-metadata a.wtf
[err!] Unable to parse file: a.wtf
Thanks to the tip by #bryc, I managed to find a solution. The data in MySQL is in base64 encoding (uploaded file a.wtf is already in binary format). I decode them back to a binary stream and saved it as a.webm file. After that, I ran the following command.
$ ffmpeg -i a.webm -ac 1 -f wav -vn -ar 20500 a.wav
I have a file that holds manufacturing orders for a machine.
I would like to read the content of this file and edit it, but when I open it in a text editor i.e. Notepad++, I get a bunch of wierd charecters:
xÚ¥—_HSQÀo«a)’êaAXŽâê×pD8R‰¬©s“i+ƒ´#¡$
-þl-ó/ÓíºIúPôàƒHˆP–%a&RÎÈn÷ü¹·;Ú;ç<ìòÝÃý}¿ó}‡{϶«rWg>˜›ãR‡)Çn0³Ûf³yÎW[5–šw½ÇRW{ñ’rO6¹ŽŸp¦ÙœcÏ.9yÀnýg
)Ë—e90ejÕø£rC. f¦}3ËŒ˜hü”å1g[…ø±ú ÜJøz®‹˜YfÈ,4`ŽKÉ—ù“ÔË¿d„þlG3#=˜Ž´+hF¬¦£€«šm¿áØ
ïÖµv‡ËpíÍ~™‡Aù
šëÈÚ]ÿç™DŒÉFØ ïƒæsij ¦y=-74Æ/t=ÕŠr\˜š»Âä‰Ý¨žã΢
dz·à‡'fœ½yâ½4qåPjácòÄŒeÊhñ“ý™ÙÎÕ÷5ôlñ=˜Õ{ú;ø=Û;4OêYä>Ìpxbæâ'è"oëB×1gQ9“'¹]Ô³’Ô³ø!ÌózÞyŸõžÓIŽù*&OÌXPÕ"ŽWžpíOÌè‚Þ3Òr0{Ž†R=_?…/¼žÞ0,ê=/?£ûÓËîy“2Z<ij³[ËÁì™÷–ôžÎ’Ããa÷<Maêéí…¼ž}©žYýZ-˜=”á¤}π>3°¢÷œ$ïè‰3ìž«ƒÄs¿—xnŒÀ*¯gi$ÕómDËÁìùIeоû‡À¬?3°x¾"~ª§c˜öÝÇî颌°›x¾Fßb>Ï}QXÓ{öFi-êÙßóR”œe^Ñ÷ü‘¿g[Lë ŽwJZϘë¹3”³L©gH‚,^Ïe 2ôžWGøëÙ2‚Î
øœL¾ÅqÈäõ,ýç\œË3¾þeྗ&`Ϻ<KÒf“’»ðù]í‰ãžU^wèþåÔÖy”H}ò•6ø6
It looks like the file is encoded.
Any idea how to find the encoding and make the file readable and editable?
It's binary and probably encoded so without knowledge of data structure you can't do much - just reverse engineering based on trying and checking what changed, operating with hex editor.
It isn't impossible, tho. If you can change the data the way you know (eg. change number of orders from 1 to 2) and export to file, you can compare binary values and find which byte holds that number. Of course if it is encrypted and you don't know the key... It's easier to find another way.
For further read, check this out - https://en.wikibooks.org/wiki/Reverse_Engineering/File_Formats
If you've got access to a Linux box why not use
hexdump -C <filename>
You will be able to get a much better insight into how the file is structured, than by using a text editor.
There are also many "hexdump" equivalent commands on Windows
I have a binary file (.bin) and a (.txt) file.
Using Python3, is there any way to combine these two files into one file (WITHOUT using any compressor tool if possible)?
And if I have to use a compressor, I want to do this with python.
As an example, I have 'file.txt' and 'file.bin', I want a library that gets these two and gives me one file, and also be able to un-merge the file.
Thank you
Just create a tar archive, a module that let's you accomplish this task is already bundled with Cpython, and it's called tarfile.
more examples here.
there are a lot of solutions for compressing!
gzip or zlib would allows compression and decompression and could be a solution for your problem.
Example of how to GZIP compress an existing file from [http://docs.python.org]:
import gzip
f_in = open('file.txt', 'rb')
f_out = gzip.open('file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()
but also tarfile is a good solution!
Tar's the best solution to get binary file.
If you want the output to be a text, you can use base64 to transform binary file into a text data, then concatenate them into one file (using some unique string (or other technique) to mark the point they were merged).
I want to read a .gz file (text.gz) with 300MB length and search a pattern in it. I opened the text file in a binary format using fopen with "rb" and stored it in a buffer. When I search a pattern that I know it exists in the text, the result is wrong. When I debug the program, the elements of the buffer are different from what I expect. Do I have to read and store these kind of files in other ways??????
You might try using zlib and gzread to read the file.
http://zlib.net/manual.html
Try this.
gunzip -c file.gz | grep <pattern>
If the program is exiting and failing to read the file, a real common problem is that you don't close the file in Notepad or whatever is using it and the FileIO fails due to not being able to access the file. Make sure you don't have anything with that file open before you test your program.
Is there a good way to display the contents of a file as binary?
I am creating a program that needs to save and load a 2D arrays from a files.
When loading a saved file the result appears different.
I need to be able to view the contents of the saved file in plain binary to tell if my problem in in my save or load function.
Is there a program like octal dump but is binary dump?
Thanks.
On linux/unix (or Windows + cygwin) there is the "od" utility which dumps files in many formats.
E.g. hexadecimal:
od -t x1 file...
I hope it may help you.
Regards
Just for fun, using Ruby from the command line:
cat file | ruby -e "puts STDIN.read.unpack('B*')[0].scan(/[01]{8}/).join(' ')"
Having the raw binary dump is too overwhelming for most people. Consider using od -x, or if you need a more specific format then examine the various options for -t.