Different folder size after compression/decompression on different machines

Different folder size after compression/decompression on different machines - filesystems

In a server migration (from old server A to new server B), I compressed the folder /home/user (size 620M, got with "du -sh") with the command
tar -zcpf user.tar.gz /home/user/ >> /log.txt
then I moved user.tar.gz to the new server by curl/ftp, gave the command
tar -xzf user.tar.gz -C /home/ >> /extract_log.txt
and the resulting /home/user/ directory has size equal to 625M!
How is it possibile? The number of files inside is the same, and if I check a different size folder (differences got with "ls -l" command), the files inside appear the same with "ls -l".
Is perhaps due to the different machines/hard drives? (home partitions are both ext4)

The files can all be identical, but take more space, if the new server has a larger block size, e.g. 8K instead of 4K. The space a file takes is rounded up to an integral number of blocks. That example would result in 4K more being taken up by about half of the files.

Related

Want to compare delta files recursively using rsync when I have file list in left in text file and Folder at right hand

I have a list of files mentioned at left hand side in a text file and I have a separate folder where I have list of physical files. I have to compare the left hand's FileList.txt with Right hand's Directory files(Recursively) and copy delta part using rsync. I am using the below command but not getting any files to copy.
Below is the dry run attempt .
rsync -rvnc --include-from=/cygdrive/c/Users/SG066221/Desktop/scripts/diff_Lib_WITH_EMPLTY.txt /cygdrive/c/Users/SG066221/Desktop/scripts/FROM_LIST_2_ANOTHER/ 1>C:\Users\SG066221\Desktop\scripts\diff_FINAL.txt
Output is :
sending incremental file list
drwx------ 0 2018/11/12 14:26:18 .
sent 38 bytes received 64 bytes 204.00 bytes/sec
total size is 0 speedup is 0.00 (DRY RUN)

The correct syntax for rsync is:
rsync <options> <include> <exclude> src/ dest/
Your problems:
If you only list one directory, nothing will happen.
If you have includes without excludes then it'll include everything.
(You have dry-run set, but you probably knew that.)
Try this:
rsync -rvc --include-from=file.txt --exclude='*' src/ dest/
Make sure that file.txt contains only the filenames within src/ (i.e. with "src/" stripped off). Make sure that any sub-directories you want files copied from are listed too, on their own line (alternatively, add --include='*/' before the exclude).
What it says is, copy from src to dest, including files in file.txt, and excluding everything else.

what is the difference between hadoop -appendToFile versus hadoop -put when used for updating stream data into hdfs continously

As per hadoop source code following descriptions are pulled out from the classes -
appendToFile
"Appends the contents of all the given local files to the
given dst file. The dst file will be created if it does not exist."
put
"Copy files from the local file system into fs. Copying fails if the file already exists, unless the -f flag is given.
Flags:
-p : Preserves access and modification times, ownership and the mode.
-f : Overwrites the destination if it already exists.
-l : Allow DataNode to lazily persist the file to disk. Forces
replication factor of 1. This flag will result in reduced
durability. Use with care.
-d : Skip creation of temporary file(<dst>._COPYING_)."
I am trying to update a file into hdfs regularly as it is being updated dynamically from a streaming source in my local File System.
Which one should I use out of appendToFile and put, and Why?

appendToFile modifies the existing file in HDFS, so only the new data needs to be streamed/written to the filesystem.
put rewrites the entire file, so the entire new version of the file needs to be streamed/written to the filesystem.
You should favor appendToFile if you are just appending to the file (i.e. adding logs to the end of a file). This function will be faster if that's your use case. If the file is changing more than just simple appends to the end, you should use put (slower but you won't lose data or corrupt your file).

Arelle locate ratio extraction command that i cannot understand to find in docs(~2pages)

The basic command while we are working with Command Line Operation in Arelle is:
python arelleCmdLine.py arguments
provided we go with the cmd to the folder that arelle is installed.
I have devoted huge resources but i cannot find if there is a command in the Documentation (about ~2 pages) that can output ratios (e.g. Current Ratio) or metrics (e.g. Revenue) instead of having to download all the data in Columns and filter the data. I must admit that i cannot understand some commands in the documentation.
What i am doing to download data is:
python arelleCmdLine.py -f http://www.sec.gov/Archives/edgar/data/1009672/000119312514065056/crr-20131231.xml -v --facts D:\Save_in_File.html --factListCols "Label Name contextRef unitRef Dec Prec Lang Value EntityScheme EntityIdentifier Period Dimensions"
-f is the command that pulls data and after that is a location with data in the web
-v is the command to validate the data that are pulled
--facts Saves the data in an HTML file in a designated directory
factListCols is the Columns i choose to have (i take all the available columns in the upper command).
There is an absolute zero on tutorials.
Arelle only runs on Python 3 and can be downloaded without creating a hassle only by following these quick and simple steps.

Unix combine a bunch of text files without taking extra disk space?

I have a bunch of text files I need to temporarily concatenate so that I can pass a single file (representing all of them) to some post-processing script.
Currently I am doing:
zcat *.rpt.gz > tempbigfile.txt
However this tempbigfile.txt is 3.3GB, while the original size of the folder with all the *.rpt.gz files is only 646MB! So I'm temporarily quadroupling the disk space used. Of course after I can call myscript.pl with tempbigfile.txt, it's done and I can rm the tempbigfile.txt.
Is there a solution to not create such a huge file and still get all those files together in one file object?

You're deflating the files with zcat, so you should compress the text once more with gzip:
zcat *.rpt.gz | gzip > tempbigfile.txt

Delete all files except

I have a folder with a few files in it; I like to keep my folder clean of any stray files that can end up in it. Such stray files may include automatically generated backup files or log files, but could be a simple as someone accidentally saving to the wrong folder (my folder).
Rather then have to pick through all this all the time I would like to know if I can create a batch file that only keeps a number of specified files (by name and location) but deletes anything not on the "list".

[edit] Sorry when I first saw the question I read bash instead of batch. I don't delete the not so useful answer since as was pointed out in the comments it could be done with cygwin.
You can list the files, exclude the one you want to keep with grep and the submit them to rm.
If all the files are in one directory:
ls | grep -v -f ~/.list_of_files_to_exclude | xargs rm
or in a directory tree
find . | grep -v -f ~/.list_of_files_to_exclude | xargs rm
where ~/.list_of_files_to_exclude is a file with the list of patterns to exclude (one per line)
Before testing it make a backup copy and substitute rm with echo to see if the output is really what you want.

White lists for file survival is an incredibly dangerous concept. I would strongly suggest rethinking that.
If you must do it, might I suggest that you actually implement it thus:
Move ALL files to a backup area (one created per run such as a directory containing the current date and time).
Use your white list to copy back files that you wanted to keep, such as with copy c:\backups\2011_04_07_11_52_04\*.cpp c:\original_dir).
That way, you keep all the non-white-listed files in case you screw up (and you will at some point, trust me) and you don't have to worry about negative logic in your batch file (remove all files that _aren't of all these types), instead using the simpler option (move back every file that is of each type).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Different folder size after compression/decompression on different machines - filesystems

The files can all be identical, but take more space, if the new server has a larger block size, e.g. 8K instead of 4K. The space a file takes is rounded up to an integral number of blocks. That example would result in 4K more being taken up by about half of the files.

Related

Want to compare delta files recursively using rsync when I have file list in left in text file and Folder at right hand

what is the difference between hadoop -appendToFile versus hadoop -put when used for updating stream data into hdfs continously

Arelle locate ratio extraction command that i cannot understand to find in docs(~2pages)

Unix combine a bunch of text files without taking extra disk space?

Delete all files except

Categories

Resources