Looking to take only main folder name within a tarball & match it to folders to see if it's been extracted

Looking to take only main folder name within a tarball & match it to folders to see if it's been extracted - arrays

I have a situation where I need to keep .tgz files & if they've been extracted, remove the extracted directory & contents.
In all examples, the only top-level directory within the tarball has a different name than the tarball itself:
[host1]$ find / -name "*\#*.tgz" #(has an # symbol somewhere in the name)
/1-#-test.tgz
[host1]$ tar -tzvf /1-#-test.tgz | head -n 1 | awk '{ print $6 }'
TJ #(directory name)
What I'd like to accomplish (pulling my hair out; rusty scripting fingers), is to look at each tarball, see if the corresponding directory name (like above) exists. If it does, echo "rm -rf /directoryname" into an output file for review.
I can read all of the tarballs into an array ... but how to check the directories?
Frustrated & appreciate any help.

Maybe you're looking for something like this:
find / -name "*#*.tgz" | while read line; do
dir=$(tar ztf "$line" | awk -F/ '{print $6; exit}')
test -d "$dir" && echo "rm -fr '$dir'"
done
Explanation:
We iterate over the *#*.tgz files found with a while loop, line by line
Get the list of files in the tgz file with tar ztf "$line"
Since paths are separated by /, use that as the separator in the awk, print the 6th field. After the print we exit, making this equivalent to but more efficient than using head -n1 first
With dir=$(...) we put the entire output of the tar..awk chain, thus the 6th field of the first file in the tar, into the variable dir
We check if such directory exists, if yes then echo an rm command so you can review and execute later if looks good
My original answer used a find ... -exec but I think that's not so good in this particular case:
find / -name "*#*.tgz" -exec \
sh -c 'dir=$(tar ztf "{}" | awk -F/ "{print \$6; exit}");\
test -d "$dir" && echo "rm -fr \"$dir\""' \;
It's not so good because of running sh for every file, and since we are using {} in the subshell, we lose the usual benefits of a typical find ... -exec where special characters in {} are correctly handled.

Related

Adapt file renaming script so that it searches out each file in sub-directories

I have a .csv file in which looks something like this:
unnamed_0711-42_p1.mov,day1_0711-42_p1.mov
unnamed_0711-51_p2.mov,day1_0711-51_p2.mov
unnamed_0716-42_p1_2.mov,day1_0716-42_p1_2.mov
unnamed_0716-51_p2_2.mov,day1_0716-51_p2_2.mov
I have written this code to rename files from the name in field 1 (e.g. unnamed_0711-42_p1.mov), to the name in field 2 (e.g. day1_0711-42_p1.mov).
csv=/location/rename.csv
cat $csv | while IFS=, read -r -a arr; do mv "${arr[#]}"; done
However, this script only works when it and all the files that need to be renamed are in the same directory. This was okay previously, but now I need to find files in various subdirectories (without adding the full path to my .csv file).
How can I adapt my script so that is searches out the files in subdirectories then changes the name as before?

A simple way to make this work, though it leads to an inefficient script, is this:
for D in `find . -type d`
do
cat $csv | while IFS=, read -r -a arr; do mv "${arr[#]}"; done
done
This will run your command for every directory in the current directory, but this runs through the entire list of filenames for every subdirectory. An alternative would be to search for each file as you process it's name:
csv=files.csv
while IFS=, read -ra arr; do
while IFS= read -r -d '' old_file; do
old_dir=$(dirname "$old_file")
mv "$old_file" "$old_dir/${arr[1]}"
done < <(find . -name "${arr[0]}" -print0)
done<"$csv"
This uses find to locate each old filename, then uses dirname to get the directory of the old file (which we need so that mv does not place the renamed file into a different directory).
This will rename every instance of each file (i.e., if unnamed_0711-42_p1.mov appears in multiple subdirectories, each instance will be renamed to day1_0711-42_p1.mov). If you know each file name will only appear once, you can speed things up a bit by adding -print -quit to the end of the find command, before the pipe.

Below script
while IFS=, read -ra arr # -r to prevent mangling backslashes
do
find . -type f -name "${arr[0]}" -printf "mv '%p' '%h/${arr[1]}'" | bash
done<csvfile
should do it.
See [ find ] manpage to understand what the printf specifiers like %p,%h do

Script to group numbered files into folders

I have around a million files in one folder in the form xxxx_description.jpg where xxx is a number ranging from 100 to an unknown upper.
The list is similar to this:
146467_description1.jpg
146467_description2.jpg
146467_description3.jpg
146467_description4.jpg
14646_description1.jpg
14646_description2.jpg
14646_description3.jpg
146472_description1.jpg
146472_description2.jpg
146472_description3.jpg
146500_description1.jpg
146500_description2.jpg
146500_description3.jpg
146500_description4.jpg
146500_description5.jpg
146500_description6.jpg
To get the file number down in the at folder I'd like to put them all into folders grouped by the number at the start.
ie:
146467/146467_description1.jpg
146467/146467_description2.jpg
146467/146467_description3.jpg
146467/146467_description4.jpg
14646/14646_description1.jpg
14646/14646_description2.jpg
14646/14646_description3.jpg
146472/146472_description1.jpg
146472/146472_description2.jpg
146472/146472_description3.jpg
146500/146500_description1.jpg
146500/146500_description2.jpg
146500/146500_description3.jpg
146500/146500_description4.jpg
146500/146500_description5.jpg
146500/146500_description6.jpg
I was thinking to try and use command line: find | awk {} | mv command or maybe write a script, but I'm not sure how to do this most efficiently.

If you really are dealing with millions of files, I suspect that a glob (*.jpg or [0-9]*_*.jpg may fail because it makes a command line that's too long for the shell. If that's the case, you can still use find. Something like this might work:
find /path -name "[0-9]*_*.jpg" -exec sh -c 'f="{}"; mkdir -p "/target/${f%_*}"; mv "$f" "/target/${f%_*}/"' \;
Broken out for easier reading, this is what we're doing:
find /path - run find, with /path as a starting point,
-name "[0-9]*_*.jpg" - match files that match this filespec in all directories,
-exec sh -c execute the following on each file...
'f="{}"; - put the filename into a variable...
mkdir -p "/target/${f%_*}"; - make a target directory based on that variable (read mkdir's man page about the -p option)
mv "$f" "/target/${f%_*}/"' - move the file into the directory.
\; - end the -exec expression
On the up side, it can handle any number of files that find can handle (i.e. limited only by your OS). On the down side, it's launching a separate shell for each file to be handled.
Note that the above answer is for Bourne/POSIX/Bash. If you're using CSH or TCSH as your shell, the following might work instead:
#!/bin/tcsh
foreach f (*_*.jpg)
set split = ($f:as/_/ /)
mkdir -p "$split[1]"
mv "$f" "$split[1]/"
end
This assumes that the filespec will fit in tcsh's glob buffer. I've tested with 40000 files (894KB) on one command line and not had a problem using /bin/sh or /bin/csh in FreeBSD.
Like the Bourne/POSIX/Bash parameter expansion solution above, this avoids unnecessary calls to external I haven't tested that, and would recommend the find solution even though it's slower.

You can use this script:
for i in [0-9]*_*.jpg; do
p=`echo "$i" | sed 's/^\([0-9]*\)_.*/\1/'`
mkdir -p "$p"
mv "$i" "$p"
done

Using grep
for file in *.jpg;
do
dirName=$(echo $file | grep -oE '^[0-9]+')
[[ -d $dirName ]] || mkdir $dirName
mv $file $dirName
done
grep -oE '^[0-9]+' extracts the starting digits in the filename as
146467
146467
146467
146467
14646
...
[[ -d $dirName ]] returns 1 if the directory exists
[[ -d $dirName ]] || mkdir $dirName ensures that the mkdir works only if the test [[ -d $dirName ]] fails, that is the direcotry does not exists

Move files containing X but not containing Y

To manage my backup sync folder, I am trying to come up with a command that would move files beginning with string1* but NOT ending with *string2 from /folder1 to /folder2
What would a command containing such two opposite conditions (HAS and HAS NOT) look like?

#!/bin/bash
for i in `ls -d /folder1/string1* | grep -v 'string2$'`
do
ls -ld $i | grep '^-' > /dev/null # Test that we have a regular file and not a directory etc.
if [ $? == 0 ]; then
mv $i /folder2
fi
done

Try something like
find /folder1 -mindepth 1 -maxdepth 1 -type f \
-name 'string1*' \! -name '*string2' -exec cp -iv {} /folder2 +
Note: If your have a older version of find you can replace + with \;

To me this is another case for (what I shall denote) the read while pattern.
cd /folder1
ls string1* | grep -v 'string2$' | while read f; do mv $f /folder2; done
The other answers are good alternatives, and in particular, find can do a lot. But I always get a headache using find, and never quite use it enough to do so without the manpage open.
Also, starting with ls or a simple find to get a list of files, and then using any or all of sed, awk, grep or whatever you have to hand, to adjust/trim/extend this list, and then bunging it into a loop, is a crude(ish) but pretty powerful technique.

Shell script to find and move files and delete the affected folders

Is it possible to create a shell script that finds files in and below a folder of a specific type, and move them to another folder (this part is already solved*). And then remove all folders where files were moved from?
I guess it should be somehow possible to write the path/folder name into an array or file and then read from there to "rm" them.
finding files example:
find . \( -name "*.mpeg" – o –name "*.mkv" –o –name "*.avi" –o –name "*.mov" \) –size +536870912c –exec mv -n –v {} /my/favorite/folder/ \;
second part where those folders from which was moved need to be deleted is still missing.

If you din't mind using another executables, you can do it like:
find . \( -name "*.mpeg" – o –name "*.mkv" –o –name "*.avi" –o –name "*.mov" \) –size +536870912c | \
gawk -F'/[^/]*$' '{print "echo mv " $0 " /WHERE/TO/MOVE" ; dirstoremove[$1]++ }
END {for (d in dirstoremove) {print "echo rmdir " d } }' | \
bash
First find your items
Then gawk creates the list of removable files and store their path in an array and after processing all the files prints the rmdir for every element in the array
And sends it output to bash which executes it.
Note: the above won't remove anything just echoes what should be done. If you are satisfied with the result and want to execute it you can remove the two "echo" or simply add one more | bash to the end.

If you like one-liners (like I do), you can extend your command as follows:
<your find command> -print | xargs -l dirname | sort -u | xargs rmdir
-print echoes the path of the file that has been moved.
xargs -l dirname strips off the filenames, resulting in a list of directory names.
sort -u reduces the list of directory names by removing duplicates.
xargs rmdir removes the directories.

Shell command/script to delete files whose names are in a text file

I have a list of files in a .txt file (say list.txt). I want to delete the files in that list. I haven't done scripting before. Could some give the shell script/command I can use. I have bash shell.

while read -r filename; do
rm "$filename"
done <list.txt
is slow.
rm $(<list.txt)
will fail if there are too many arguments.
I think it should work:
xargs -a list.txt -d'\n' rm

Try this command:
rm -f $(<file)

If the file names have spaces in them, none of the other answers will work; they'll treat each word as a separate file name. Assuming the list of files is in list.txt, this will always work:
while read name; do
rm "$name"
done < list.txt

For fast execution on macOS, where xargs custom delimiter d is not possible:
<list.txt tr "\n" "\0" | xargs -0 rm

The following should work and leaves you room to do other things as you loop through.
Edit: Don't do this, see here: http://porkmail.org/era/unix/award.html
for file in $(cat list.txt); do rm $file; done

I was just looking for a solution to this today and ended up using a modified solution from some answers and some utility functions I have.
// This is in my .bash_profile
# Find
ffe () { /usr/bin/find . -name '*'"$#" ; } # ffe: Find file whose name ends with a given string
# Delete Gradle Logs
function delete_gradle_logs() {
(cd ~/.gradle; ffe .out.log | xargs -I# rm#)
}

On linux, you can try:
printf "%s\n" $(<list.txt) | xargs -I# rm #
In my case, my .txt file contained a list of items of the kind *.ext and worked fine.