Adapt file renaming script so that it searches out each file in sub-directories - arrays

I have a .csv file in which looks something like this:
unnamed_0711-42_p1.mov,day1_0711-42_p1.mov
unnamed_0711-51_p2.mov,day1_0711-51_p2.mov
unnamed_0716-42_p1_2.mov,day1_0716-42_p1_2.mov
unnamed_0716-51_p2_2.mov,day1_0716-51_p2_2.mov
I have written this code to rename files from the name in field 1 (e.g. unnamed_0711-42_p1.mov), to the name in field 2 (e.g. day1_0711-42_p1.mov).
csv=/location/rename.csv
cat $csv | while IFS=, read -r -a arr; do mv "${arr[#]}"; done
However, this script only works when it and all the files that need to be renamed are in the same directory. This was okay previously, but now I need to find files in various subdirectories (without adding the full path to my .csv file).
How can I adapt my script so that is searches out the files in subdirectories then changes the name as before?

A simple way to make this work, though it leads to an inefficient script, is this:
for D in `find . -type d`
do
cat $csv | while IFS=, read -r -a arr; do mv "${arr[#]}"; done
done
This will run your command for every directory in the current directory, but this runs through the entire list of filenames for every subdirectory. An alternative would be to search for each file as you process it's name:
csv=files.csv
while IFS=, read -ra arr; do
while IFS= read -r -d '' old_file; do
old_dir=$(dirname "$old_file")
mv "$old_file" "$old_dir/${arr[1]}"
done < <(find . -name "${arr[0]}" -print0)
done<"$csv"
This uses find to locate each old filename, then uses dirname to get the directory of the old file (which we need so that mv does not place the renamed file into a different directory).
This will rename every instance of each file (i.e., if unnamed_0711-42_p1.mov appears in multiple subdirectories, each instance will be renamed to day1_0711-42_p1.mov). If you know each file name will only appear once, you can speed things up a bit by adding -print -quit to the end of the find command, before the pipe.

Below script
while IFS=, read -ra arr # -r to prevent mangling backslashes
do
find . -type f -name "${arr[0]}" -printf "mv '%p' '%h/${arr[1]}'" | bash
done<csvfile
should do it.
See [ find ] manpage to understand what the printf specifiers like %p,%h do

Related

Script to read file, find in the file system, and move to a directory

I'm horrible with scripting and have been pouring over sites for the past 2 days and haven't found a solution, nor been able to piece one together. I'm at my wits end and need some help.
I have a text file with partial names and numbers that I need to find within a directory and its subdirectories and move to a separate directory.
I've tried the following with no luck:
#!/bin/bash
getArray() {
array=()
while IFS= read -r line
do
array+=("$line")
done < "$1"
}
getArray "dan.txt"
for e in "${array[#]}"
do
mv "$e" /root/moved
done
Thanks in advance.
If by partial file names, you mean they have something that they match within each file name, you can do:
find . -name "foo*"
If it's dependent also on the directory (i.e., any parent folders that are within the path), use the -regex option flag instead, which will match agaisnt the entire file path, instead of the file name:
find . -regex '.*/foo/[^/]*.doc'
So:
while read -r line; do
find . -name "$line"
done < "$1"
EDIT per the comments:
while read -r line; do
find "/data/Match" -type f -iname "$line\.*" -exec mv -t /root/moved {} \;
done < "$1"
You may need to change the -iname argument to the following:
If they are complete file names (e.g., line with value "filename.txt" matches a file named "filename.txt), then -iname "$line".
If they are partial file names (e.g., line with value "file" matches "file.txt" and "filename.txt"), then -iname "$line*".
If they are complete names without file extensions, (e.g., line with value "file" matches "file.txt", but not "filename.txt"), then -iname "$line\.*"
Bit of an explanation for that find command btw:
-type f - files only (d for directories, l for soft/hardlinks);
-iname - like -name, but case-insensitive;
-exec - command to execute for each matching file, ending with \;
mv -t /root/moved {} - the -t option is the target directory, followed by the path of that target to move the file to. The source file argument (i.e., the file path we're moving) is the final parameter in mv, which is {}, which represents the current file path in find -exec command.
Note that you may need to play with "$line*" as I can't tell how vague your names are from the text file. If they will definitely match from the beginning (e.g., a line in the file with the value "somefile" only matches a file named in the format "somefile.*", where the * is any amount of characters)

Looking to take only main folder name within a tarball & match it to folders to see if it's been extracted

I have a situation where I need to keep .tgz files & if they've been extracted, remove the extracted directory & contents.
In all examples, the only top-level directory within the tarball has a different name than the tarball itself:
[host1]$ find / -name "*\#*.tgz" #(has an # symbol somewhere in the name)
/1-#-test.tgz
[host1]$ tar -tzvf /1-#-test.tgz | head -n 1 | awk '{ print $6 }'
TJ #(directory name)
What I'd like to accomplish (pulling my hair out; rusty scripting fingers), is to look at each tarball, see if the corresponding directory name (like above) exists. If it does, echo "rm -rf /directoryname" into an output file for review.
I can read all of the tarballs into an array ... but how to check the directories?
Frustrated & appreciate any help.
Maybe you're looking for something like this:
find / -name "*#*.tgz" | while read line; do
dir=$(tar ztf "$line" | awk -F/ '{print $6; exit}')
test -d "$dir" && echo "rm -fr '$dir'"
done
Explanation:
We iterate over the *#*.tgz files found with a while loop, line by line
Get the list of files in the tgz file with tar ztf "$line"
Since paths are separated by /, use that as the separator in the awk, print the 6th field. After the print we exit, making this equivalent to but more efficient than using head -n1 first
With dir=$(...) we put the entire output of the tar..awk chain, thus the 6th field of the first file in the tar, into the variable dir
We check if such directory exists, if yes then echo an rm command so you can review and execute later if looks good
My original answer used a find ... -exec but I think that's not so good in this particular case:
find / -name "*#*.tgz" -exec \
sh -c 'dir=$(tar ztf "{}" | awk -F/ "{print \$6; exit}");\
test -d "$dir" && echo "rm -fr \"$dir\""' \;
It's not so good because of running sh for every file, and since we are using {} in the subshell, we lose the usual benefits of a typical find ... -exec where special characters in {} are correctly handled.

remove blank first line script

I have this script which is printing out the files that have the first line blank:
for f in `find . -regex ".*\.php"`; do
for t in head; do
$t -1 $f |egrep '^[ ]*$' >/dev/null && echo "blank line at the $t of $f";
done;
done
How can I improve this to actually remove the blank line too, or at least copy all the files with the blank first line somewhere else.
I tried copying using this, which is good, because it copies preserving the directory structure, but it was copying every php file, and I needed to capture the postive output of the egrep and only copy those files.
rsync -R $f ../DavidSiteBlankFirst/
I would use sed personally
find ./ -type f -regex '.*\.php' -exec sed -i -e '1{/^[[:blank:]]*$/d;}' '{}' \;
this finds all the regular files ending in .php and executes the sed command which works on the first line only and checks to see if its blank and deletes it if it is, other blank lines in the file remain unaffected.
Just using find and sed:
find . -type f -name "*.php" -exec sed -i '1{/^\s*$/d;q;}' {} \;
The -type f option only find files, not that I expect you would name folders with a .php suffix but it's good practice. The use of -regex '.*\.php' is overkill and messier just using globbing -name "*.php". Use find's -exec instead of a shell script, the sed script will operate on each matching file passed by find.
The sed script looks at the first line only 1 and applies the operations inside {} to that line. We check if the line is blank /^\s*$/ if the line matches we delete d it and quit q the script so not to read all the other lines in the file. The -i option saves the change back to the file as the default behaviour of sed is to print to stdout. If you want back files making use -i~ instead, this will create a backfile file~ for file.

Need bash to separate cat'ed string to separate variables and do a for loop

I need to get a list of files added to a master folder and copy only the new files to the respective backup folders; The paths to each folder have multiple folders, all named by numbers and only 1 level deep.
ie /tester/a/100
/tester/a/101 ...
diff -r returns typically "Only in /testing/a/101: 2093_thumb.png" per line in the diff.txt file generated.
NOTE: there is a space after the colon
I need to get the 101 from the path and filename into separate variables and copy them to the backup folders.
I need to get the lesserfolder var to get 101 without the colon
and mainfile var to get 2093_thumb.png from each line of the diff.txt and do the for loop but I can't seem to get the $file to behave. Each time I try testing to echo the variables I get all the wrong results.
#!/bin/bash
diff_file=/tester/diff.txt
mainfolder=/testing/a
bacfolder= /testing/b
diff -r $mainfolder $bacfolder > $diff_file
LIST=`cat $diff_file`
for file in $LIST
do
maindir=$file[3]
lesserfolder=
mainfile=$file[4]
# cp $mainfolder/$lesserFolder/$mainfile $bacfolder/$lesserFolder/$mainfile
echo $maindir $mainfile $lesserfolder
done
If I could just get the echo statement working the cp would work then too.
I believe this is what you want:
#!/bin/bash
diff_file=/tester/diff.txt
mainfolder=/testing/a
bacfolder= /testing/b
diff -r -q $mainfolder $bacfolder | egrep "^Only in ${mainfolder}" | awk '{print $3,$4}' > $diff_file
cat ${diff_file} | while read foldercolon mainfile ; do
folderpath=${foldercolon%:}
lesserFolder=${folderpath#${mainfolder}/}
cp $mainfolder/$lesserFolder/$mainfile $bacfolder/$lesserFolder/$mainfile
done
But it is much more reliable (and much easier!) to use rsync for this kind of backup. For example:
rsync -a /testing/a/* /testing/b/
You could try a while read loop
diff -r $mainfolder $bacfolder | while read dummy dummy dir file; do
echo $dir $file
done

Shell command/script to delete files whose names are in a text file

I have a list of files in a .txt file (say list.txt). I want to delete the files in that list. I haven't done scripting before. Could some give the shell script/command I can use. I have bash shell.
while read -r filename; do
rm "$filename"
done <list.txt
is slow.
rm $(<list.txt)
will fail if there are too many arguments.
I think it should work:
xargs -a list.txt -d'\n' rm
Try this command:
rm -f $(<file)
If the file names have spaces in them, none of the other answers will work; they'll treat each word as a separate file name. Assuming the list of files is in list.txt, this will always work:
while read name; do
rm "$name"
done < list.txt
For fast execution on macOS, where xargs custom delimiter d is not possible:
<list.txt tr "\n" "\0" | xargs -0 rm
The following should work and leaves you room to do other things as you loop through.
Edit: Don't do this, see here: http://porkmail.org/era/unix/award.html
for file in $(cat list.txt); do rm $file; done
I was just looking for a solution to this today and ended up using a modified solution from some answers and some utility functions I have.
// This is in my .bash_profile
# Find
ffe () { /usr/bin/find . -name '*'"$#" ; } # ffe: Find file whose name ends with a given string
# Delete Gradle Logs
function delete_gradle_logs() {
(cd ~/.gradle; ffe .out.log | xargs -I# rm#)
}
On linux, you can try:
printf "%s\n" $(<list.txt) | xargs -I# rm #
In my case, my .txt file contained a list of items of the kind *.ext and worked fine.

Resources