In practicing bash, I tried writing a script that searches the home directory for duplicate files in the home directory and deletes them. Here's what my script looks like now.
#!/bin/bash
# create-list: create a list of regular files in a directory
declare -A arr1 sumray origray
if [[ -d "$HOME/$1" && -n "$1" ]]; then
echo "$1 is a directory"
else
echo "Usage: create-list Directory | options" >&2
exit 1
fi
for i in $HOME/$1/*; do
[[ -f $i ]] || continue
arr1[$i]="$i"
done
for i in "${arr1[#]}"; do
Name=$(sed 's/[][?*]/\\&/g' <<< "$i")
dupe=$(find ~ -name "${Name##*/}" ! -wholename "$Name")
if [[ $(find ~ -name "${Name##*/}" ! -wholename "$Name") ]]; then
mapfile -t sumray["$i"] < <(find ~ -name "${Name##*/}" ! -wholename "$Name")
origray[$i]=$(md5sum "$i" | cut -c 1-32)
fi
done
for i in "${!sumray[#]}"; do
poten=$(md5sum "$i" | cut -c 1-32)
for i in "${!origray[#]}"; do
if [[ "$poten" = "${origray[$i]}" ]]; then
echo "${sumray[$i]} is a duplicate of $i"
fi
done
done
Originally, where mapfile -t sumray["$i"] < <(find ~ -name "${Name##*/}" ! -wholename "$Name") is now, my line was the following:
sumray["$i"]=$(find ~ -name "${Name##*/}" ! -wholename "$Name")
This saved the output of find to the array. But I had an issue. If a single file had multiple duplicates, then all locations found by find would be saved to a single value. I figured I could use the mapfile command to fix this, but now it's not saving anything to my array at all. Does it have to do with the fact that I'm using an associative array? Or did I just mess up elsewhere?
I'm not sure if I'm allowed to answer my own question, but I figured that I should post how I solved my problem.
As it turns out, the mapfile command does not work on associative arrays at all. So my fix was to save the output of find to a text file and then store that information in an indexed array. I tested this a few times and I haven't seemed to encounter any errors yet.
Here's my finished script.
#!/bin/bash
# create-list: create a list of regular files in a directory
declare -A arr1 origray
declare indexray
#Verify that Parameter is a directory.
if [[ -d "$HOME/$1/" && -n "$1" ]]; then
echo "Searching for duplicates of files in $1"
else
echo "Usage: create-list Directory | options" >&2
exit 1
fi
#create list of files in specified directory
for i in $HOME/${1%/}/*; do
[[ -f $i ]] || continue
arr1[$i]="$i"
done
#search for all duplicate files in the home directory
#by name
#find checksum of files in specified directory
for i in "${arr1[#]}"; do
Name=$(sed 's/[][?*]/\\&/g' <<< "$i")
if [[ $(find ~ -name "${Name##*/}" ! -wholename "$Name") ]]; then
find ~ -name "${Name##*/}" ! -wholename "$Name" >> temp.txt
origray[$i]=$(md5sum "$i" | cut -c 1-32)
fi
done
#create list of duplicate file locations.
if [[ -f temp.txt ]]; then
mapfile -t indexray < temp.txt
else
echo "No duplicates were found."
exit 0
fi
#compare similarly named files by checksum and delete duplicates
count=0
for i in "${!indexray[#]}"; do
poten=$(md5sum "${indexray[$i]}" | cut -c 1-32)
for i in "${!origray[#]}"; do
if [[ "$poten" = "${origray[$i]}" ]]; then
echo "${indexray[$count]} is a duplicate of a file in $1."
fi
done
count=$((count+1))
done
rm temp.txt
This is kind of sloppy but it does what it's supposed to do. md5sum may not be the optimal way to check for file duplicates but it works. All I have to do is replace echo "${indexray[$count]} is a duplicate of a file in $1." with rm -i ${indexray[$count]} and it's good to go.
So my next question would have to be...why doesn't mapfile work with associative arrays?
Related
I have a script that removes invalid characters from files due to one drive restrictions. It works except for file that have * { } in them. I have used the * { } but that is not working (ignores those files). Script is below. Not sure what I am doing wrong here.
#Renames FOLDERS with space at the end
IFS=$'\n'
for file in $(find -d . -name "* ")
do
target_name=$(echo "$file" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
#end Folder rename
#Renames FILES
declare -a arrayGrep=(\? \* \, \# \; \: \& \# \+ \< \> \% \$ \~ \% \: \< \> )
echo "array: ${arrayGrep[#]}"
for i in "${arrayGrep[#]}"
do
for file in $(find . | grep $i )
do
target_name=$(echo "$file" | sed 's/\'$i'/-/g' )
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
done ````
There are some issues with your code:
You've got some duplicates in arrayGrep
You have some quoting issues: $1 in the grep and sed commands must be protected from the shell
some of the disallowed characters, even if quoted to protect from the shell, could be mis-parsed by grep as meta-characters
You loop rather a lot when all substitutions could happen simultaneously
for file in $(find ...) has to read the entire list into memory, which might be problematic for large lists. It also breaks with some filenames due to word-splitting. Piping into read is better
find can do path filtering without grep (at least for this set of disallowed characters)
sed is fine to use but tr is a little neater
A possible rewrite is:
badchars='?*,#;:&#+<>%$~'
find . -name "*[$badchars]*" | while read -r file
do
target_name=$(echo "$file" | tr "$badchars" - )
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
If using bash, you can even do parameter expansion directly,
although you have to embed the list:
target_name="${file//[?*,#;:&#+<>%$~]/-}"
an enhancement idea
If you choose a suitable character (eg. =), you can rename filenames reversibly (but only if the length of the new name wouldn't exceed the maximum allowed length for a filename). One possible algorithm:
replace all = in the filename with =3D
replace all the other disallowed characters with =hh where hh is the appropriate ASCII code in hex
You can reverse the renaming by:
replace all =hh (except =3D) with the character corresponding to ASCII code hh
replace all =3D with =
I am trying to convert the following script which I use to create archives into one which extracts them.
[[ $# -lt 2 ]] && exit 1
name=$1; shift
files=("$#")
#exclude all files/directories that are not readable
for index in "${!files[#]}"; do
[[ -r ${files[index]} ]] || unset "files[index]"
done
[[ ${#files[#]} -eq 0 ]] && exit 1
if tar -czvf "${name:-def_$$}.tar.gz" "${files[#]}"; then
echo "Ok"
else
echo "Error"
exit 1
fi
So far I have this:
[[ $# -lt 1 ]] && exit 1
files=("$#")
#remove files and directories which are not readable
for index in "${!files[#]}"; do
[[ -r ${files[index]} ]] || unset "files[index]"
done
[[ ${#files[#]} -eq 0 ]] && exit 1
if tar -xzvf "${files[#]}".tar.gz; then
echo "OK"
else
echo "Error"
exit 1
fi
I dont know whether I needed to keep the shift as for this script I do not need to discard any arguments. I want to be able to take them all and unzip each one. Also I see there is a -C switch which allows the user to choose where the unzipped files go. How would I go about also adding this as an option for the user because they may or may not want to change the directory where the files get unzipped to.
You unfortunately can't just do tar -xzvf one.tar.gz two.tar.gz. Straightforward approach is to use a good old for loop:
for file in "${files[#]}"; do
tar -xzvf "$file"
done
Or you can use this:
cat "${files[#]}" | tar -xzvf - -i
You can have the first argument to be the specified directory for the -C option:
[[ $# -lt 2 ]] && exit 1
target=$1; shift
files=("$#")
#remove files and directories which are not readable
for index in "${!files[#]}"; do
[[ -r ${files[index]} ]] || unset "files[index]"
done
[[ ${#files[#]} -eq 0 ]] && exit 1
mkdir -p -- "$target" || exit 1
for file in "${files[#]}"; do
tar -xzvf "$file" -C "$target"
done
./script /some/path one.tar.gz two.tar.gz
List of files for tar can be also constructed like this:
target=$1; shift
for file; do
[[ -r $file ]] && files+=("$file")
done
I have the following files and directories:
/tmp/jj/
/tmp/jj/ese
/tmp/jj/ese/2010
/tmp/jj/ese/2010/test.db
/tmp/jj/dfhdh
/tmp/jj/dfhdh/2010
/tmp/jj/dfhdh/2010/rfdf.db
/tmp/jj/ddfxcg
/tmp/jj/ddfxcg/2010
/tmp/jj/ddfxcg/2010/df.db
/tmp/jj/ddfnghmnhm
/tmp/jj/ddfnghmnhm/2010
/tmp/jj/ddfnghmnhm/2010/sdfs.db
I want to rename all 2010 directories to their parent directories then tar all .db files...
What I tried is:
#!/bin/bash
if [ $# -ne 1 ]; then
echo "Usage: `basename $0` <absolute-path>"
exit 1
fi
if [ "$(id -u)" != "0" ]; then
echo "This script must be run as root" 1>&2
exit 1
fi
rm /tmp/test
find $1 >> /tmp/test
for line in $(cat /tmp/test)
do
arr=$( (echo $line | awk -F"/" '{for (i = 1; i < NF; i++) if ($i == "2010") print $(i-1)}') )
for index in "${arr[#]}"
do
echo $index #HOW TO WRITE MV COMMAND RATHER THAN ECHO COMMAND?
done
done
1) The result is:
ese
dfhdh
ddfxcg
ddfnghmnhm
But it should be:
ese
dfhdh
ddfxcg
ddfnghmnhm
2) How can I rename all 2010 directories to their parent directory?
I mean how to do (I want to do it in loop because of larg numbers of dirs):
mv /tmp/jj/ese/2010 /tmp/jj/ese/ese
mv /tmp/jj/dfhdh/2010 /tmp/jj/dfhdh/dfhdh
mv /tmp/jj/ddfxcg/2010 /tmp/jj/ddfxcg/ddfxcg
mv /tmp/jj/ddfnghmnhm/2010 /tmp/jj/ddfnghmnhm/ddfnghmnhm
You could instead use find in order to determine if a directory contains a subdirectory named 2010 and perform the mv:
find /tmp -type d -exec sh -c '[ -d "{}"/2010 ] && mv "{}"/2010 "{}"/$(basename "{}")' -- {} \;
I'm not sure if you have any other question here but this would do what you've listed at the end of the question, i.e. it would:
mv /tmp/jj/ese/2010 /tmp/jj/ese/ese
and so on...
Can be done using grep -P:
grep -oP '[^/]+(?=/2010)' file
ese
ese
dfhdh
dfhdh
ddfxcg
ddfxcg
ddfnghmnhm
ddfnghmnhm
This should be close:
find "$1" -type d -name 2010 -print |
while IFS= read -r dir
do
parentPath=$(dirname "$dir")
parentDir=$(basename "$parentPath")
echo mv "$dir" "$parentPath/$parentDir"
done
Remove the echo after testing. If your dir names can contain newlines then look into the -print0 option for find, and the -0 option for xargs.
First, only iterate through the dirs you're interested in, and avoid temporary files:
for d in $(find $1 -type d -name '2010') ; do
Then you can use basename and dirname to extract parts of that directory name and reconstruct the desired one. Something like:
b="$(dirname $d)"
p="$(basename $b)"
echo mv "$d" "$b/$p"
You could use shell string replace operations instead of basename/dirname.
I'm curious as to why the following:
array1=(file1 file2 file3)
array2=()
for i in ${array1[#]}
do
find . -name $i -type f -print0 2>/dev/null | \
while read -d '' -r file
do
array2+=( $file )
done
done
fails to populate array2 assuming the filenames file1, file2, and file3 exist in the filesystem in sub-directories from the parent where the search is initiated. I would appreciate if someone could point out where I mis-stepped here.
Try this:
array1=(file1 file2 file3)
array2=()
for i in "${array1[#]}"
do
while read -d '' -r file
do
array2+=( "$file" )
done < <(find . -name "$i" -type f -print0)
done
Due to your use of pipes sub shell is created and your array2 values get lost when sub shell ends.
If you are using bash 4, you can avoid using find:
shopt -s globstar
array1=(file1 file2 file3)
array2=()
for i in "${array1[#]}"
do
for f in **/"$i"; do
[[ -f "$f" ]] && array2+=( "$f" )
done
done
I have done like this but I am having trouble with shellscript I have written. I am confused with tail command functionality and also when I see output of error.log on terminal it shows lines with 'e' deleted from words.
I have written like this please guide me how can I get my problem solved. I want to read this error.log file line by line and during reading lines I want to split fixed number of lines to small files with suffix i.e log-aa,log-ab,... I did this using split command. After splitting I want to filter lines with GET or POST word in them using regex and store this filtered lines into new file. After this store gets completed I need to delete all these log-* files.
I have written like this:
enter code here
processLine(){
line="$#"
echo $line
$ tail -f $FILE
}
FILE="/var/log/apache2/error.log"
if [ "$1" == "/var/log/apache2/error.log" ]; then
FILE="/dev/stdin"
else
FILE="$1"
if [ ! -f $FILE ]; then
echo "$FILE : does not exists"
exit 1
elif [ ! -r $FILE ]; then
echo "$FILE: can not read"
exit 2
fi
fi
#BAKIFS=$IFS
#IFS=$(echo -en "\n\b")
exec 3<&0
exec 0<"$FILE"
#sed -e 's/\[debug\].*\(data\-HEAP\)\:\/-->/g' error.log > /var/log/apache2/error.log.1
while read -r line
do
processLine $line
done
exec 0<&3
IFS=$BAKIFS
logfile="/var/log/apache2/error.log"
pattern="bytes"
# read each new line as it gets written
# to the log file
#tail -1 $logfile
tail -fn0 $logfile | while read line ; do
# check each line against our pattern
echo "$line" | grep -i "$pattern"
#sed -e 's/\[debug\].*\(data\-HEAP\)\:/-->/g' error.log >/var/log/apache2/error.log
split -l 1000 error.log log-
FILE2="/var/log/apache2/log-*"
if [ "$1" == "/var/log/apache2/log-*" ]; then
FILE2="/dev/stdin"
else
FILE2="$1"
if [ ! -f $FILE2 ]; then
echo "$FILE : does not exists"
exit 1
elif [ ! -r $FILE2 ]; then
echo "$FILE: can not read"
exit 2
fi
fi
BAKIFS=$IFS
IFS=$(echo -en "\n\b")
exec 3<&0
exec 0<"$FILE2"
while read -r line
do
processLine $line
echo $line >>/var/log/apache2/url.txt
done
exec 0<&3
IFS=$BAKIFS
find . -name "var/log/apache2/logs/log-*.*" -delete
done
exit 0
The below code deletes files after reading and splitting error.log but when I put tail -f $FILE it stops deleting files I want to delete log-* files after it reaches last line of error.log file:
enter code here
processLine(){
line="$#"
echo $line
}
FILE=""
if [ "$1" == "" ]; then
FILE="/dev/stdin"
else
FILE="$1"
# make sure file exist and readable
if [ ! -f $FILE ]; then
echo "$FILE : does not exists"
exit 1
elif [ ! -r $FILE ]; then
echo "$FILE: can not read"
exit 2
fi
fi
#BAKIFS=$IFS
#IFS=$(echo -en "\n\b")
exec 3<&0
exec 0<"$FILE"
while read -r line
do
processLine $line
split -l 1000 error.log log-
cat log-?? | grep "GET\|POST" > storefile
#tail -f $FILE
done
rm log-??
exec 0<&3
#IFS=$BAKIFS
exit 0
Your code seems unnecessarily long and complex, and the logic is unclear. It should not have been allowed to grow this big without working correctly.
Consider this:
split -l 1000 error.log log-
cat log-?? | grep "GET\|POST" > storefile
rm log-??
Experiment with this. If these three commands do what you expect, you can add more functionality (e.g. using paths, checking for the existence of error.log), but don't add code until you have this part working.