checking if owner/group match in files and subfolders - file

I need help on writing a shell script to check if the owner and group matches the names I have in my IF statement. It needs to recursively check all files and folders, including the parent folder.
For example, my directory structure might look like this
/data
/data/folder1
/data/folder1/fileA
/data/folder2/fileB
I need to verify that data, folder1, folder2, fileA, fileB are all owned by the same owner and group.
#!/bin/bash
DIR="/data";
N=0;
for $DIR..
if [ NOT MATCH "username:groupname" ]; then
N=1;
fi
done
if [ $N -gt 0 ]; then
echo "all or some files and folders don't match";
else
echo "all files match";
fi

#!/bin/bash
OWNER=user
ACTUAL_OWNER='stat -c %U file'
N=0;
not_match(files)
{
if [ ! OWNER = ACTUAL_OWNER ]; then
N=1;
return;
else if [ not_end_of_file ]; then
not_match(files+1)
else
return
fi
}
DIR="/data";
not_match(files)
if [ $N -gt 0 ]; then
echo "all or some files and folders don't match";
else
echo "all files match";
fi
i wasnt sure of some of the exact commands or syntax so it has some sudo code but it is basically what you want to do. if they are a match and its not the end of the files you are searching call the function again and increase to the next file if it doesnt match say they dont and just return since you dont tell the user how many dont match and if the end is reached then success and return to print out to the user

Related

Parsing unique data and renaming files

I was trying to create a Perl script to rename the files (hundreds of files with different names), but I have not had any success. I first need to find the unique file number and then rename it to something more human readable. Since file names are not sequential, it makes it difficult.
Examples of files names: The number of importance is after que sequence
# vv-- this number
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
lane8-s250-index--TCCGGAGA-AGGCGAAG-13_S250_L008_R1_001.fastq
lane8-s251-index--TCCGGAGA-TAATCTTA-14_S251_L008_R1_001.fastq
lane7-s0007-index--ATTACTCG-TATAGCCT-193_S7_L007_R1_001.fastq
lane7-s0008-index--ATTACTCG-ATAGAGGC-105_S8_L007_R1_001.fastq
lane7-s0009-index--ATTACTCG-CCTATCCT-195_S9_L007_R1_001.fastq
lane7-s0010-index--ATTACTCG-GGCTCTGA-106_S10_L007_R1_001.fastq
lane7-s0011-index--ATTACTCG-AGGCGAAG-197_S11_L007_R1_001.fastq
lane7-s0096-index--AGCGATAG-CAGGACGT-287_S96_L007_R1_001.fastq
I have created a file called RENAMING_parse_data.sh that reference RENAMING_parse_data.pl
So in theory the idea is that it is parsing the data to find the sample # that is in the middle of the name, and taking that unique ID and renaming it. But I don't think it's even going into the IF loop.
Any ideas?
HERE IS THE .sh file that calls the perl scipt
#!/bin/bash
#first part is the program
#second is the directory path
#third and fourth times are the names of the output files
#./parse_data.pl /ACTF/Course/PATHTDIRECTORY Tabsummary.txt Strucsummary.txt
#WHERE ./parse_data.pl =name of the program
#WHERE /ACTF/Course/PATHTODIRECTORY = directory path were your field are saved AND is referred to as $dir_in = $ARGV[0] in the perl script;
#new files you recreating with the extracted data AND is refered to as $dir_in = $ARGV[1];
./RENAMING_parse_data.pl ./Test/ FishList.txt
HERE IS THE PERL SCRIP:
#!/usr/bin/perl
print (":)\n");
#Proesessing files in a directory
$dir_in = $ARGV[0];
$indv_list = $ARGV[1];
#open directory to acess those files, the folder where you have the files
opendir(DIR, $dir_in) || die ("Cannot open $dir_in");
#files = readdir(DIR);
#set all variables = 0 to void chaos
$j=0;
#open output header line for output file and print header line for tab delimited file
open(OUTFILETAB, ">", $indv_list);
print(OUTFILETAB "\t Fish ID", "\t");
#open each file
foreach (#files){
#re start all arrays to void chaos
print("in loop [$j]");
#acc_ID=();
#find FISH name
#EXAMPLE FISH NAMES: (lenth of fishname varies)
#lane8-s251-index--TCCGGAGA-TAATCTTA-14_S251_L008_R1_001.fastq.gz
#lane7-s0096-index--AGCGATAG-CAGGACGT-287_S96_L007_R1_001.final.fastq
#NOTE: what is in btween () is the ID that is printed NOTE that value can change from 2 -3 depending on Sample #
#Trials:
#lane[0-9]{1}-[a-z]{1}[0-9]{4}-index--[A-Z]{8}[A-Z]{8}-([0-9]{3})[a-z]{1}[0-9]{2}_[A-Z]{1}[0-9]{3}_[a-z]{1}[0-9]{1}_[0-9]{3}.fastq
#lane[0-9]{1}-[a-z]{1}[0-9]{4}-index--[A-Z]{8}[A-Z]{8}-([0-9]{3})*.fastq
#lane*([0-9]{3})*.fastq
#lane.*-([0-9]{2})_.*.fastq
#lane.*-([0-9]{2})_*.fastq
#lane[0-9]{1}-[a-z]{1}[0-9]{3}-index--[A-Z]{8}[A-Z]{8}-([0-9]{2})_[A-Z]{1}[0-9]{3}_L008_R1_001.fastq
$string_FISH = #files;
if ($string_FISH =~ /^lane[0-9]{1}-[a-z]{1}[0-9]{3}-index--[A-Z]{8}[A-Z]{8}-([0-9]{2})_[A-Z]{1}[0-9]{3}_L008_R1_001.fastq/){
$FISH_ID =$1;
#acc_ID[$j] = $FISH_ID;
#print ("FISH. = |$FISH_ID[$j]| \n");
rename($string_FISH, "FISH. = |$FISH_ID[$j]|");
#print ($acc_ID[$j], "\n");
print(OUTFILETAB "FISH. = |$FISH_ID[$j]| \n");
}
$j= $j+1;
}
IDEAL END RESULT
So in the end I would like it to take the file name, find the unique identifier and rename it
from :
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane7-s0007-index--ATTACTCG-TATAGCCT-193_S7_L007_R1_001.fastq
to:
Fish.01.fastq
Fish.193.fastq
Any Ideas or suggestion on hot to fix this or If it need to change completely are greatly appreciated.
At the core of a Perl solution, you could use
s/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa
For example,
$ ls -1 *.fastq
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
$ rename 's/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa' *.fastq
$ ls -1 *.fastq
Fish.01.fastq
Fish.02.fastq
Fish.09.fastq
Fish.10.fastq
Fish.11.fastq
Fish.12.fastq
(There are two similar tools named rename. This one is also known as prename.)
It's pretty simple to implement yourself:
#!/usr/bin/perl
use strict;
use warnings;
my $errors = 0;
for (#ARGV) {
my $old = $_;
s/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa;
my $new = $_;
next if $new eq $old;
if ( -e $new ) {
warn( "Can't rename \"$old\" to \"$new\": Already exists\n" );
++$errors;
}
elsif ( !rename( $old, $new ) ) {
warn( "Can't rename \"$old\" to \"$new\": $!\n" );
++$errors;
}
}
exit( !!$errors );
Provide the files to rename as arguments (e.g. using *.fastq from the shell).
$ ls -1 *.fastq
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
$ ./a *.fastq
$ ls -1 *.fastq
Fish.01.fastq
Fish.02.fastq
Fish.09.fastq
Fish.10.fastq
Fish.11.fastq
Fish.12.fastq
The existence check (-e) is to prevent accidentally renaming a bunch of files to the same name and therefore losing all but one of them.
The above is an cleaned up version of an one-liner pattern I often use.
dir /b ... | perl -nle"$o=$_; s/.../.../; $n=$_; rename$o,$n if!-e$n"
Adapted to sh:
\ls ... | perl -nle'$o=$_; s/.../.../; $n=$_; rename$o,$n if!-e$n'

Bash: rm with an array of filenames

So I'm working on making an advanced delete script. The idea is the user inputs a grep regex for what needs to be deleted, and the script does an rm operation for all of it. Basically eliminates the need to write all the code directly in the command line each time.
Here is my script so far:
#!/bin/bash
# Script to delete files passed to it
if [ $# -ne 1 ]; then
echo "Error! Script needs to be run with a single argument that is the regex for the files to delete"
exit 1
fi
IFS=$'\n'
files=$(ls -a | grep $1 | awk '{print "\"" $0 "\"" }')
## TODO ensure directory support
echo "This script will delete the following files:"
for f in $files; do
echo " $f"
done
valid=false
while ! $valid ; do
read -p "Do you want to proceed? (y/n): "
case $REPLY in
y)
valid=true
echo "Deleting, please wait"
echo $files
rm ${files}
;;
n)
valid=true
;;
*)
echo "Invalid input, please try again"
;;
esac
done
exit 0
My problem is when I actually do the "rm" operation. I keep getting errors saying No such file or directory.
This is the directory I'm working with:
drwxr-xr-x 6 user staff 204 May 9 11:39 .
drwx------+ 51 user staff 1734 May 9 09:38 ..
-rw-r--r-- 1 user staff 10 May 9 11:39 temp two.txt
-rw-r--r-- 1 user staff 6 May 9 11:38 temp1.txt
-rw-r--r-- 1 user staff 6 May 9 11:38 temp2.txt
-rw-r--r-- 1 user staff 10 May 9 11:38 temp3.txt
I'm calling the script like this: easydelete.sh '^tem'
Here is the output:
This script will delete the following files:
"temp two.txt"
"temp1.txt"
"temp2.txt"
"temp3.txt"
Do you want to proceed? (y/n): y
Deleting, please wait
"temp two.txt" "temp1.txt" "temp2.txt" "temp3.txt"
rm: "temp two.txt": No such file or directory
rm: "temp1.txt": No such file or directory
rm: "temp2.txt": No such file or directory
rm: "temp3.txt": No such file or directory
If I try and directly delete one of these files, it works fine. If I even pass that whole string that prints out before I call "rm", it works fine. But when I do it with the array, it fails.
I know I'm handling the array wrong, just not sure exactly what I'm doing wrong. Any help would be appreciated. Thanks.
Consider instead:
# put all filenames containing $1 as literal text in an array
#files=( *"$1"* )
# ...or, use a grep with GNU extensions to filter contents into an array:
# this passes filenames around with NUL delimiters for safety
#files=( )
#while IFS= read -r -d '' f; do
# files+=( "$f" )
#done < <(printf '%s\0' * | egrep --null --null-data -e "$1")
# ...or, evaluate all files against $1, as regex, and add them to the array if they match:
files=( )
for f in *; do
[[ $f =~ $1 ]] && files+=( "$f" )
done
# check that the first entry in that array actually exists
[[ -e $files || -L $files ]] || {
echo "No files containing $1 found; exiting" >&2
exit 1
}
# warn the user
echo "This script will delete the following files:" >&2
printf ' %q\n' "${files[#]}" >&2
# prompt the user
valid=0
while (( ! valid )); do
read -p "Do you want to proceed? (y/n): "
case $REPLY in
y) valid=1; echo "Deleting; please wait" >&2; rm -f "${files[#]}" ;;
n) valid=1 ;;
esac
done
I'll go into the details below:
files has to be explicitly created as an array to actually be an array -- otherwise, it's just a string with a bunch of files in it.
This is an array:
files=( "first file" "second file" )
This is not an array (and, in fact, could be a single filename):
files='"first file" "second file"'
A proper bash array is expanded with "${arrayname[#]}" to get all contents, or "$arrayname" to get only the first entry.
[[ -e $files || -L $files ]]
...thus checks the existence (whether as a file or a symlink) of the first entry in the array -- which is sufficient to tell if the glob expression did in fact expand, or if it matched nothing.
A boolean is better represented with numeric values than a string containing true or false: Running if $valid has potential to perform arbitrary activity if the contents of valid could ever be set to a user-controlled value, whereas if (( valid )) -- checking whether $valid is a positive numeric value (true) or otherwise (false) -- has far less room for side effects in presence of bugs elsewhere.
There's no need to loop over array entries to print them in a list: printf "$format_string" "${array[#]}" will expand the format string additional times whenever it has more arguments (from the array expansion) than its format string requires. Moreover, using %q in your format string will quote nonprintable values, whitespace, newlines, &c. in a format that's consumable by both human readers and the shell -- whereas otherwise a file created with touch $'evil\n - hiding' will appear to be two list entries, whereas in fact it is only one.

shell script : if array value was greater than a number then run a command

i have a a files containing usernames and users sent count mail per line . for example (dont know how many line have ) :
info.txt >
500 example1
40 example2
20 example3
....
..
.
if the number was greater than X , i want to run commands containing the user name and act on user .
getArray() {
users=() # Create array
while IFS= read -r line # Read a line
do
users+=("$line") # Append line to the array
done < "$1"
}
getArray "/root/.myscripts/spam1/info.txt"
# i know this part is incorrect and need help here :
if [ "${users[1$]}" -gt "50" ]
then
echo "${users[2$] has sent ${users[1$]} emails"
fi
please Help
Thanks
Not knowing how many lines of input you have is no reason to use an array. Indeed, it is generally more useful if you assume your input is infinite (an input stream), so reading into an array is impossible. Just read each line and take action if necessary:
#!/bin/sh
while read -r count user; do
if test "$count" -gt 50; then
echo "$user has sent $count emails"
fi
done < /root/.myscripts/spam1/info.txt

How to put sorted array from perl back into shell

I have been working on a shell program that asks for a name of a file you wish to work with; then with one of the selections, sort it using a perl program. I got the shell program to the file into perl, and sorted the file. But now I am stuck as to put the file back to shell and save it into a new file. This is what I tried:
Perl:
use strict;
use warnings;
my $filename = $ARGV[0];
open(MYINPUTFILE, $filename); # open for input
my (#lines) = <MYINPUTFILE>; # read file into list
#lines = sort(#lines); # sort the list
my ($line);
foreach $line (#lines) # loop thru list
{
print "$line"; # print in sort order
}
close(MYINPUTFILE);
This prints the sorted list.
Just for reference this code is taking a file from the shell script and working with it. Here is that code
Shell:
#!/bin/bash
clear
printf "Hello. \nPlease input a filename for a file containing a list of words you would like to use. Please allow for one word per line.\n -> "
read filename
printf "You have entered the filename: $filename.\n"
if [ -f "$filename" ] #check if the file even exists in the current directory to use
then
printf "The file $filename exists. What would you like to do with this file?\n\n"
else
printf "The file: $filename, does not exist. Rerun this shell script and please enter a valid file with it's proper file extension. An example of this would be mywords.txt \n\nNow exiting.\n\n"
exit
fi
printf "Main Menu\n"
printf "=========\n"
printf "Select 1 to sort file using Shell and output to a new file.\n"
printf "Select 2 to sort file using Perl and output to a new file.\n"
printf "Select 3 to search for a word using Perl.\n"
printf "Select 4 to exit.\n\n"
echo "Please enter your selection below"
read selection
printf "You have selected option $selection.\n"
if [ $selection -eq "1" ]
then
read -p "What would you like to call the new file? " newfile #asks user what they want to call the new file that will have the sorted list outputted to it
sort $filename > $newfile
echo "Your file: $newfile, has been created."
fi
if [ $selection -eq "2" ]
then
read -p "What would you like to call the new file? " newfile2
perl sort.pl $filename
# > $newfile2 #put the sorted list into the new output file that the user specificed with newfile2
fi
if [ $selection -eq "3" ]
then
perl search.pl $filename
fi
if [ $selection -eq "4" ]
then
printf "Now exiting.\n\n"
exit
fi
Any help is appreciated, Thanks!
As mentioned in the comments (but moved here to answer form):
Your Perl script outputs the results on STDOUT which means that the calling shell script can redirect it to an output file. You'll use something similar to your option #1.
Change:
perl sort.pl $filename
To:
perl sort.pl $filename > $newfile2

Dealing with hidden files when making an array of files inside a directory, using Perl

I am using Perl. I am making an array of files inside a directory. Hidden files, ones that begin with a dot, are at the beginning of my array. I want to actually ignore and skip over those, since I do not need them in the array. These are not the files I am looking for.
The solution to the problem seems easy. Just use regular expression to search for and exclude hidden files. Here's my code:
opendir(DIR, $ARGV[0]);
my #files = (readdir(DIR));
closedir(DIR);
print scalar #files."\n"; # used just to help check on how long the array is
for ( my $i = 0; $i < #files; $i++ )
{
# ^ as an anchor, \. for literal . and second . for match any following character
if ( $files[ $i ] =~ m/^\../ || $files[ $i ] eq '.' ) #
{
print "$files[ $i ] is a hidden file\n";
print scalar #files."\n";
}
else
{
print $files[ $i ] . "\n";
}
} # end of for loop
This produces an array #files and shows me the hidden files I have in the directory. Next step is to remove the hidden files from the array #files. So use the shift function, like this:
opendir(DIR, $ARGV[0]);
my #files = (readdir(DIR));
closedir(DIR);
print scalar #files."\n"; # used to just to help check on how long the array is
for ( my $i = 0; $i < #files; $i++ )
{
# ^ as an anchor, \. for literal . and second . for match any following character
if ( $files[ $i ] =~ m/^\../ || $files[ $i ] eq '.' ) #
{
print "$files[ $i ] is a hidden file\n";
shift #files;
print scalar #files."\n";
}
else
{
print $files[ $i ] . "\n";
}
} # end of for loop
I get an unexpected result. My expectation is that the script will:
make the array #files,
scan through that array looking for files that begin with a dot,
find a hidden file, tell me it found one, then promptly shift it off the front end of the array #files,
then report to me the size or length of #files,
otherwise, just print the name of the files that I am actually interested in using.
The first script works fine. The second version of the script, the one using the shift function to remove hidden files from #files, does find the first hidden file (. or current directory) and shifts it off. It does not report back to me about .., the parent directory. It also does not find another hidden file that is currently in my directory to test things out. That hidden file is a .DS_store file. But on the other had, it does find a hidden .swp file and shifts it out.
I can't account for this. Why does the script work OK for the current directory . but not the parental directory ..? And also, why does the script work OK for a hidden .swp file but not the hidden .DS_Store file?
After shifting a file, your index $i now points to the following file.
You can use grep to get rid of the files whose names start with a dot, no shifting needed:
my #files = grep ! /^\./, readdir DIR;

Resources