Renaming Files with "Invalid Characters" - arrays

I have a script that removes invalid characters from files due to one drive restrictions. It works except for file that have * { } in them. I have used the * { } but that is not working (ignores those files). Script is below. Not sure what I am doing wrong here.
#Renames FOLDERS with space at the end
IFS=$'\n'
for file in $(find -d . -name "* ")
do
target_name=$(echo "$file" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
#end Folder rename
#Renames FILES
declare -a arrayGrep=(\? \* \, \# \; \: \& \# \+ \< \> \% \$ \~ \% \: \< \> )
echo "array: ${arrayGrep[#]}"
for i in "${arrayGrep[#]}"
do
for file in $(find . | grep $i )
do
target_name=$(echo "$file" | sed 's/\'$i'/-/g' )
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
done ````

There are some issues with your code:
You've got some duplicates in arrayGrep
You have some quoting issues: $1 in the grep and sed commands must be protected from the shell
some of the disallowed characters, even if quoted to protect from the shell, could be mis-parsed by grep as meta-characters
You loop rather a lot when all substitutions could happen simultaneously
for file in $(find ...) has to read the entire list into memory, which might be problematic for large lists. It also breaks with some filenames due to word-splitting. Piping into read is better
find can do path filtering without grep (at least for this set of disallowed characters)
sed is fine to use but tr is a little neater
A possible rewrite is:
badchars='?*,#;:&#+<>%$~'
find . -name "*[$badchars]*" | while read -r file
do
target_name=$(echo "$file" | tr "$badchars" - )
if [ "$file" != "$target_name" ]; then
if [ -e $target_name ]; then
echo "WARNING: $target_name already exists, file not renamed"
else
echo "Move $file to $target_name"
mv "$file" "$target_name"
fi
fi
done
If using bash, you can even do parameter expansion directly,
although you have to embed the list:
target_name="${file//[?*,#;:&#+<>%$~]/-}"
an enhancement idea
If you choose a suitable character (eg. =), you can rename filenames reversibly (but only if the length of the new name wouldn't exceed the maximum allowed length for a filename). One possible algorithm:
replace all = in the filename with =3D
replace all the other disallowed characters with =hh where hh is the appropriate ASCII code in hex
You can reverse the renaming by:
replace all =hh (except =3D) with the character corresponding to ASCII code hh
replace all =3D with =

Related

Issue using diff with array and value quoted SHELL [duplicate]

This question already has answers here:
How can I store the "find" command results as an array in Bash
(8 answers)
Closed 2 months ago.
Hi guys i'm having an issue while using diff.
In my script i'm trying to compare all files in 1 dir to all files in 2 other dir
Using diff to compare is files are the same.
Here is my script :
`
#!/bin/bash
files1=()
files2=()
# Directories to compare. Adding quotes at the begining and at the end of each files found in content1 & content3
content2=$(find /data/logs -name "*.log" -type f)
content1=$(find /data/other/logs1 -type f | sed 's/^/"/g' | sed 's/$/"/g')
content3=$(find /data/other/logs2 -type f | sed 's/^/"/g' | sed 's/$/"/g')
# ADDING CONTENT INTO FILES1 & FILES2 ARRAY
while read -r line; do
files1+=("$line")
done <<< "$content1"
# content1 and content3 goes into the same array
while read -r line3;do
files1+=("$line3")
done <<< "$content3"
while read -r line2; do
files2+=("$line2")
done <<< "$content2"
# Here i'm trying to compare 1 by 1 the files in files2 to all files1
for ((i=0; i<${#files2[#]}; i++))
do
for ((j=0; j<${#files1[#]}; j++))
do
if [[ -n ${files2[$i]} ]];then
diff -s "${files2[$i]}" "${files1[$j]}" > /dev/null
if [[ $? == 0 ]]; then
echo ${files1[$j]} "est identique a" ${files2[$i]}
unset 'files2[$i]'
break
fi
fi
done
done
#SHOW THE FILES WHO DIDN'T MATCHED
echo ${files2[#]}
`
I'm having the folling issue when i'm trying to diff :
diff: "/data/content3/other/log2/perso log/somelog.log": No such file or directory
But when i'm doing
ll "/data/content3/other/log2/perso log/somelog.log" -rw-rw-r-- 2 lopom lopom 551M 30 oct. 18:53 '/data/content3/other/logs2/perso log/somelog.log'
So the file exist.
i need those quotes because sometimes there are space in the path
Does some1 know how to fix that ?
Thanks.
I already tried to change the quotes by single quotes, but it didn't fixed it
First, don't do this -
content2=$(find /data/logs -name "*.log" -type f)
content1=$(find /data/other/logs1 -type f | sed 's/^/"/g' | sed 's/$/"/g')
content3=$(find /data/other/logs2 -type f | sed 's/^/"/g' | sed 's/$/"/g')
don't stack all these into single vars. This is asking for ten kinds of obscure trouble. More importantly, those sed calls are embedding the quotation marks into the data as part of the filenames, which is probably what's causing diff to crash, because there are no actual files with the quotes in the name.
Also, if you are throwing away the output and just using diff to check the files are identical, try cmp instead. The -s is silent, and it's a lot faster since it exits at the first differing byte without reading the rest of both files and generating a report. If there ae a lot of files, this will add up.
If the logs are the only things in the directories, and you don't have to scan subdirectoies, and the filename can't appear in both /data/other/logs1 AND /data/other/logs2, but you're pretty sure it will be in at least one of them... then simplify:
for f in /data/logs/*.log # I'll assume these are all files...
do t=/data/other/logs[12]/"${f#/data/logs/}" # always just one?
if cmp -s "$f" "$t" # cmp -s *has* no output
then echo "$t est identique a $f" # files are same
elif [[ -e "$t" ]] # check t exists
then echo "$t diffère de $f" # maybe ls -l "$f" "$t" ?
else echo "$t n'existe pas" # report it does not
fi
done
This needs no arrays, no find, no sed calls, etc.
If you do need to read subdirectories, use shopt to handle it with globs so that you don't have to worry about parsing odd characters with read. (c.f. https://mywiki.wooledge.org/ParsingLs for some reasons.)
shopt -s globstar
for f in /data/logs/**/*.log # globstar makes ** match at arbitrary depth
do for t in /data/other/logs[12]/**/"${f#/data/logs/}" # if >1 possible hit
do if cmp -s "$f" "$t"
then echo "$t est identique a $f"
elif [[ -e "$t" ]]
then echo "$t diffère de $f"
else echo "$t n'existe pas" # $t will be the glob, one iteration
fi
done
done

Using mapfile to save output to associative arrays

In practicing bash, I tried writing a script that searches the home directory for duplicate files in the home directory and deletes them. Here's what my script looks like now.
#!/bin/bash
# create-list: create a list of regular files in a directory
declare -A arr1 sumray origray
if [[ -d "$HOME/$1" && -n "$1" ]]; then
echo "$1 is a directory"
else
echo "Usage: create-list Directory | options" >&2
exit 1
fi
for i in $HOME/$1/*; do
[[ -f $i ]] || continue
arr1[$i]="$i"
done
for i in "${arr1[#]}"; do
Name=$(sed 's/[][?*]/\\&/g' <<< "$i")
dupe=$(find ~ -name "${Name##*/}" ! -wholename "$Name")
if [[ $(find ~ -name "${Name##*/}" ! -wholename "$Name") ]]; then
mapfile -t sumray["$i"] < <(find ~ -name "${Name##*/}" ! -wholename "$Name")
origray[$i]=$(md5sum "$i" | cut -c 1-32)
fi
done
for i in "${!sumray[#]}"; do
poten=$(md5sum "$i" | cut -c 1-32)
for i in "${!origray[#]}"; do
if [[ "$poten" = "${origray[$i]}" ]]; then
echo "${sumray[$i]} is a duplicate of $i"
fi
done
done
Originally, where mapfile -t sumray["$i"] < <(find ~ -name "${Name##*/}" ! -wholename "$Name") is now, my line was the following:
sumray["$i"]=$(find ~ -name "${Name##*/}" ! -wholename "$Name")
This saved the output of find to the array. But I had an issue. If a single file had multiple duplicates, then all locations found by find would be saved to a single value. I figured I could use the mapfile command to fix this, but now it's not saving anything to my array at all. Does it have to do with the fact that I'm using an associative array? Or did I just mess up elsewhere?
I'm not sure if I'm allowed to answer my own question, but I figured that I should post how I solved my problem.
As it turns out, the mapfile command does not work on associative arrays at all. So my fix was to save the output of find to a text file and then store that information in an indexed array. I tested this a few times and I haven't seemed to encounter any errors yet.
Here's my finished script.
#!/bin/bash
# create-list: create a list of regular files in a directory
declare -A arr1 origray
declare indexray
#Verify that Parameter is a directory.
if [[ -d "$HOME/$1/" && -n "$1" ]]; then
echo "Searching for duplicates of files in $1"
else
echo "Usage: create-list Directory | options" >&2
exit 1
fi
#create list of files in specified directory
for i in $HOME/${1%/}/*; do
[[ -f $i ]] || continue
arr1[$i]="$i"
done
#search for all duplicate files in the home directory
#by name
#find checksum of files in specified directory
for i in "${arr1[#]}"; do
Name=$(sed 's/[][?*]/\\&/g' <<< "$i")
if [[ $(find ~ -name "${Name##*/}" ! -wholename "$Name") ]]; then
find ~ -name "${Name##*/}" ! -wholename "$Name" >> temp.txt
origray[$i]=$(md5sum "$i" | cut -c 1-32)
fi
done
#create list of duplicate file locations.
if [[ -f temp.txt ]]; then
mapfile -t indexray < temp.txt
else
echo "No duplicates were found."
exit 0
fi
#compare similarly named files by checksum and delete duplicates
count=0
for i in "${!indexray[#]}"; do
poten=$(md5sum "${indexray[$i]}" | cut -c 1-32)
for i in "${!origray[#]}"; do
if [[ "$poten" = "${origray[$i]}" ]]; then
echo "${indexray[$count]} is a duplicate of a file in $1."
fi
done
count=$((count+1))
done
rm temp.txt
This is kind of sloppy but it does what it's supposed to do. md5sum may not be the optimal way to check for file duplicates but it works. All I have to do is replace echo "${indexray[$count]} is a duplicate of a file in $1." with rm -i ${indexray[$count]} and it's good to go.
So my next question would have to be...why doesn't mapfile work with associative arrays?

Script terminates prematurely after do loop. Last "echo" is not executed.

I am looking to execute the following script below. The issue I am encountering is it will not execute anything after the do loop. It doesn't matter what I happen to have after the loop, it never executes, so I am def missing something somewhere.
Also, any suggestions on a more efficient way or writing this script? I am very new to the scripting environment and very open to better ways of going about things.
#!/bin/bash
# mcidas environment
PATH=$HOME/bin:
PATH=$PATH:/usr/sww/bin:/usr/local/bin:$HOME/mcidas/bin
PATH=$PATH:/home/mcidas/bin:/bin:/usr/bin:/etc:/usr/ucb
PATH=$PATH:/usr/bin/X11:/common/tool/bin:.
export PATH
MCPATH=$HOME/mcidas/data
MCPATH=$MCPATH:/home/mcidas/data
export MCPATH
#variables
basedir1="ftp://ladsweb.nascom.nasa.gov/allData/6/MOD02QKM" #TERRA
basedir2="ftp://ladsweb.nascom.nasa.gov/allData/6/MYD02QKM" #AQUA
day=`date +%j`
day1=`date +"%j" -d "-1 day"`
hour=`date -u +"%H"`
min=`date -u +"%m"`
year=`date -u +"%Y"`
segment=1
count=$(ls /satellite/modis_processed/ | grep -v ^d | wc -l)
count_max=25
files=(/satellite/modis_processed/*)
if [ $hour -ge "17" ]; then
workinghour="16"
echo "Searching for hour $workinghour"
url="${basedir2}/${year}/${day1}/MYD02QKM.A${year}${day1}.${workinghour}*.006.${year}*"
wget -r -nd --no-parent -nc -e robots=off -R 'index.*' -P /satellitemodis/ $url
#find /satellite/modis/ -type f -mmin -30 -exec cp "{}" /satellite/modis_processed/ \;
for files in /satellite/modis_processed/*
do
echo "The number used for the data file is ${count}"
echo "The number used for the image file is ${segment}"
export segment
export count
#Run McIDAS
mcenv <<- 'EOF'
imgcopy.k MODISD.${count} MODISI.${segment} BAND=1 SIZE=SAME
imgremap.k MODISD.${segment} MODISI.${segment} BAND=1 SIZE=ALL PRO=MERC
imgcha.k MODISI.${segment} CTYPE=BRIT
exit
EOF
segment=`expr ${segment} + 1`
count=`expr ${count} - 1`
#Reset Counter if equal or greater than 25
if [[ $segment -ge $count_max ]]; then
segment=1
fi
find /satellite/awips -type f -name "AREA62*" -exec mv "{}" /awips2/edex/data/manual/ \;
done;
echo "We have exported ${segment} converted modis files to EDEX."
fi
You have a here-document in that script. That here-document is not properly terminated. The end marker, EOF, needs to be in the first column, not indented at all.
If you indent it, it has to be with tabs, and the start of the here-document should be <<-'EOF'.
The effect of the wrongly indented EOF marker is that the rest of the script is read as the contents of the here-document.
As Charles Duffy points out, ShellCheck is your friend.

echo a variable and then grep to see if value exist in a file is not returning anything. Unix Shell Scripting

I'm trying to figure out how to determine if a variable contains a value from a file using grep, this is not returning anything, so I'm going to explain it.
I have my code that is this:
MyFiles="MyFile-I-20160606_141_Employees.txt"
DirFiles="/dev/fs/C/Users/salasfri/Desktop/MyFiles.txt"
for OutFile in $(cat $DirFiles); do
if [[ $( echo $MyFiles | grep -c $OutFile ) -gt 0 ]]; then
print "The file $OutFile exist!!"
fi
done
and the file in /dev/fs/C/Users/salasfri/Desktop/MyFiles.txt contains the following values:
MyFile-I-*_141_Employees.txt
MyFile-I-*_141_Products.txt
MyFile-I-*_141_Deparments.txt
the idea is verify if the variable "MyFiles" is found in the MyFiles.txt file, as you can see is using the pattern "*" due that is a date, it will change.
that solutions is not returning any count of files, there's something that I'm doing wrong?
You can try to change the searchstring before searching.
An example with three teststrings:
for teststring in MyFile-I-20160606_141_Employees.txt MyFile-I-20160606_142_Employees.txt MyFile-I-20160606_141_Others.txt
do
grepstr=$(sed 's/[0-9]\{8\}_/*_/' <<< "${teststring}")
fgrep "${grepstr}" "${DirFiles}"
found=$(fgrep "${grepstr}" "${DirFiles}")
if [ $? -eq 0 ]; then
echo "${found} matches ${teststring}."
fi
done
In your case you can make the code shorter with
fgrep -q "$(sed 's/[0-9]\{8\}_/*_/' <<< "${MyFiles}")" $DirFiles &&
echo "The file $(sed 's/[0-9]\{8\}_/*_/' <<< "${MyFiles}") exist!!"
Your patterns are glob-style patterns, not regular expressions. The pattern abc-*_X.txt will not match the string abc-1234_X.txt.
You want to use a shell construct that does glob matching.
MyFiles="MyFile-I-20160606_141_Employees.txt"
sed 's/\r$//' "/dev/fs/C/Users/salasfri/Desktop/MyFiles.txt" \
| while IFS= read -r Pattern; do
if [[ $MyFiles == $Pattern ]]; then
print "$MyFiles matches pattern $Pattern"
break
fi
done

how to split files from error.log file of apache server while reading file line by line continuously?

I have done like this but I am having trouble with shellscript I have written. I am confused with tail command functionality and also when I see output of error.log on terminal it shows lines with 'e' deleted from words.
I have written like this please guide me how can I get my problem solved. I want to read this error.log file line by line and during reading lines I want to split fixed number of lines to small files with suffix i.e log-aa,log-ab,... I did this using split command. After splitting I want to filter lines with GET or POST word in them using regex and store this filtered lines into new file. After this store gets completed I need to delete all these log-* files.
I have written like this:
enter code here
processLine(){
line="$#"
echo $line
$ tail -f $FILE
}
FILE="/var/log/apache2/error.log"
if [ "$1" == "/var/log/apache2/error.log" ]; then
FILE="/dev/stdin"
else
FILE="$1"
if [ ! -f $FILE ]; then
echo "$FILE : does not exists"
exit 1
elif [ ! -r $FILE ]; then
echo "$FILE: can not read"
exit 2
fi
fi
#BAKIFS=$IFS
#IFS=$(echo -en "\n\b")
exec 3<&0
exec 0<"$FILE"
#sed -e 's/\[debug\].*\(data\-HEAP\)\:\/-->/g' error.log > /var/log/apache2/error.log.1
while read -r line
do
processLine $line
done
exec 0<&3
IFS=$BAKIFS
logfile="/var/log/apache2/error.log"
pattern="bytes"
# read each new line as it gets written
# to the log file
#tail -1 $logfile
tail -fn0 $logfile | while read line ; do
# check each line against our pattern
echo "$line" | grep -i "$pattern"
#sed -e 's/\[debug\].*\(data\-HEAP\)\:/-->/g' error.log >/var/log/apache2/error.log
split -l 1000 error.log log-
FILE2="/var/log/apache2/log-*"
if [ "$1" == "/var/log/apache2/log-*" ]; then
FILE2="/dev/stdin"
else
FILE2="$1"
if [ ! -f $FILE2 ]; then
echo "$FILE : does not exists"
exit 1
elif [ ! -r $FILE2 ]; then
echo "$FILE: can not read"
exit 2
fi
fi
BAKIFS=$IFS
IFS=$(echo -en "\n\b")
exec 3<&0
exec 0<"$FILE2"
while read -r line
do
processLine $line
echo $line >>/var/log/apache2/url.txt
done
exec 0<&3
IFS=$BAKIFS
find . -name "var/log/apache2/logs/log-*.*" -delete
done
exit 0
The below code deletes files after reading and splitting error.log but when I put tail -f $FILE it stops deleting files I want to delete log-* files after it reaches last line of error.log file:
enter code here
processLine(){
line="$#"
echo $line
}
FILE=""
if [ "$1" == "" ]; then
FILE="/dev/stdin"
else
FILE="$1"
# make sure file exist and readable
if [ ! -f $FILE ]; then
echo "$FILE : does not exists"
exit 1
elif [ ! -r $FILE ]; then
echo "$FILE: can not read"
exit 2
fi
fi
#BAKIFS=$IFS
#IFS=$(echo -en "\n\b")
exec 3<&0
exec 0<"$FILE"
while read -r line
do
processLine $line
split -l 1000 error.log log-
cat log-?? | grep "GET\|POST" > storefile
#tail -f $FILE
done
rm log-??
exec 0<&3
#IFS=$BAKIFS
exit 0
Your code seems unnecessarily long and complex, and the logic is unclear. It should not have been allowed to grow this big without working correctly.
Consider this:
split -l 1000 error.log log-
cat log-?? | grep "GET\|POST" > storefile
rm log-??
Experiment with this. If these three commands do what you expect, you can add more functionality (e.g. using paths, checking for the existence of error.log), but don't add code until you have this part working.

Resources