How can I remove lines from .txt files using a loop?

How can I remove lines from .txt files using a loop? - file

In my current directory I have a couple of .txt files. I want to write a script to search for a string in those .txt files, and delete lines which contains that string.
For example, I'd like to delete all lines which have the word "start" in all .txt files in my current directory.
I have written the following code, but I don't know how to continue!
#!bin\bash
files=`find . -maxdepth 1 -name \*.txt`
How should I use "while" to go through each file?

Use Globs to Populate Loop Variables
When you use -maxdepth 1 on the current directory, you aren't recursing into subdirectories. If that's the case, there's no need at all to use find just to match files with an extension; you can use shell globs instead to populate your loop constructs. For example:
#!/bin/bash
# Run sed on each file to delete the line.
for file in *txt; do
sed -i '/text to match/d' "$file"
done
This is simple, and avoids a number of filename-related issues that you may have when passing filename arguments between processes. Keep it simple!

Easy cheasy:
sed -i "s/^.*string.*//" *.txt
this will remove any line containing 'string' on each .txt file

You use it along with read to get each filename in turn, after piping the results of find to it. Then you just pass the filename to sed to delete the lines you're interested in.

with open(file_listoflinks, 'r+', encoding='utf-8') as f_link:
lines = f_link.readlines() # read an store all lines into list
f_link.seek(0) # move file pointer to the beginning of a file
f_link.truncate() # truncate the file
# start writing lines except the first line
# lines[1:] from line 2 to last line
f_link.writelines(lines[1:])

Related

Make a list of all files in two folders then iterate through the combined list randomly

I have two directories with photos that I want to manipulate to output a random order of the files each time a script is run. How would I create such a list?
d1=/home/Photos/*.jpg
d2=/mnt/JillsPC/home/Photos/*.jpg
# somehow make a combined list, files = d1 + d2
# somehow randomise the file order
# during execution of the for;do;done loop, no file should be repeated
for f in $files; do
echo $f # full path to each file
done

I wouldn't use variables if you don't have to. It's more natural if you chain a couple of commands together with pipes or process substitution. That way everything operates on streams of data without loading the entire list of names into memory all at once.
You can use shuf to randomly permute input lines, and find to list files one per line. Or, to be maximally safe, let's use \0 separators. Finally, a while loop with process substitution reads line by line into a variable.
while IFS= read -d $'\0' -r file; do
echo "$file"
done < <(find /home/Photos/ /mnt/JillsPC/home/Photos/ -name '*.jpg' -print0 | shuf -z)
That said, if you do want to use some variables then you should use arrays. Arrays handle file names with whitespace and other special characters correctly, whereas regular string variables muck them all up.
d1=(/home/Photos/*.jpg)
d2=(/mnt/JillsPC/home/Photos/*.jpg)
files=("${d1[#]}" "${d2[#]}")
Iterating in order would be easy:
for file in "${files[#]}"; do
echo "$file"
done
Shuffling is tricky though. shuf is still the best tool but it works best on a stream of data. We can use printf to print each file name with the trailing \0 we need to make shuf -z happy.
d1=(/home/Photos/*.jpg)
d2=(/mnt/JillsPC/home/Photos/*.jpg)
files=("${d1[#]}" "${d2[#]}")
while IFS= read -d $'\0' -r file; do
echo "$file"
done < <(printf '%s\0' "${files[#]}" | shuf -z)
Further reading:
How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
How can I find and safely handle file names containing newlines, spaces or both?
I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?
How can I randomize (shuffle) the order of lines in a file? Or select a random line from a file, or select a random file from a directory?

I came up with this solution after some more reading:
files=(/home/roy/Photos/*.jpg /mnt/JillsPC/home/jill/Photos/*.jpg)
printf '%s\n' "${files[#]}" | sort -R
Edit: updated with John's improvements from comments.
You can add any number of directories into an array declaration (though see caveat with complex names in comments).
sort -R seems to use shuf internally from looking at it's man page.
This was the original, which works, but is not as robust as the above:
files=(/home/roy/Photos/*.jpg /mnt/JillsPC/home/jill/Photos/*.jpg)
(IFS=$'\n'; echo "${files[*]}") | sort -R
With IFS=$'\n', echoing the array will display it line by line (IFS=$'somestring' is syntax for string literals with escape sequences. So unlike '\n', $'\n' is the correct way to set it to a line break). IFS is not needed when using the printf method above.
echo ${files[*]} will print out all array elements at once, using the IFS defined in

Loop thru a filename list and iterate thru a variable/array removing all strings from filenames with bash

I have a list of strings that I have in a variable and would like to remove those strings from a list of filenames. I'm pulling that string from a file that I can add to and modify over time. Some of the strings in the variable may include part of the item needed to be removed while the other may be another line in the list. Thats why I need to loop thru the entire variable list.
I'm familiar using a while loop to loop thru a list but not sure how I can loop thru each line to remove all strings from that filename.
Here's an example:
getstringstoremove=$(cat /text/from/some/file.txt)
echo "$getstringstoremove"
# Or the above can be an array
getstringstoremove=$(cat /text/from/some/file.txt)
declare -a arr=($getstringstoremove)
the above 2 should return the following lines
-SOMe.fil
(Ena)M-3_1
.So[Me].filEna)M-3_2
SOMe.fil(Ena)M-3_3
Here's the loop I was running to grab all filenames from a directory and remove anything other than the filenames
ls -l "/files/in/a/folder/" | awk -v N=9 '{sep=""; for (i=N; i<=NF; i++) {printf("%s%s",sep,$i); sep=OFS}; printf("\n")}' | while read line; do
echo "$line"
returns the following result after each loop
# 1st loop
ilikecoffee1-SOMe.fil(Ena)M-3_1.jpg
# iterate thru $getstringstoremove to remove all strings from the above file.
# 2nd loop
ilikecoffee2.So[Me].filEna)M-3_2.jpg
# iterate thru $getstringstoremove again
# 3rd loop
ilikecoffee3SOMe.fil(Ena)M-3_3.jpg
# iterate thru $getstringstoremove and again
done
the final desired output would be the following
ilikecoffee1.jpg
ilikecoffee2.jpg
ilikecoffee3.jpg
I'm running this in bash on Mac.
I hope this makes sense as I'm stuck and can use some help.
If someone has a better way of doing this by all means it doesn't have to be the way I have it listed above.

You can get the new filenames with this awk one-liner:
$ awk 'NR==FNR{a[$0];next} {for(i in a){n=index($0,i);if(n){$0=substr($0,0,n-1)substr($0,n+length(i))}}} 1' rem.txt files.lst
This assumes your exclusion strings are in rem.txt and there's a files list in files.lst.
Spaced out for easier commenting:
NR==FNR { # suck the first file into the indices of an array,
a[$0]
next
}
{
for (i in a) { # for each file we step through the array,
n=index($0,i) # search for an occurrence of this string,
if (n) { # and if found,
$0=substr($0,0,n-1)substr($0,n+length(i))
# rewrite the line with the string missing,
}
}
}
1 # and finally, print the line.
If you stow the above script in a file, say foo.awk, you could run it as:
$ awk -f foo.awk rem.txt files.lst
to see the resultant files.
Note that this just shows you how to build new filenames. If what you want is to do this for each file in a directory, it's best to avoid running your renames directly from awk, and use shell constructs designed for handling files, like a for loop:
for f in path/to/*.jpg; do
mv -v "$f" "$(awk -f foo.awk rem.txt - <<<"$f")"
done
This should be pretty obvious except perhaps for the awk options, which are:
-f foo.awk, use the awk script from this filename,
rem.txt, your list of removal strings,
-, a hyphen indicating that standard input should be used IN ADDITION to rem.txt, and
<<<"$f", a "here-string" to provide that input to awk.
Note that this awk script will work with both gawk and the non-GNU awk that is included in macos.

I think I have understood what you mean, and I would do it with Perl which comes built-in to the standard macOS - so nothing to install.
I assume you have a file called remove.txt with your list of stuff to remove, and that you want to run the script on all files in your current directory. If so, the script would be:
#!/usr/local/bin/perl -w
use strict;
# Load the strings to remove into array "strings"
my #strings = `cat remove.txt`;
for(my $i=0;$i<$#strings;$i++){
# Strip carriage returns and quote metacharacters - e.g. *()[]
chomp($strings[$i]);
$strings[$i] = quotemeta($strings[$i]);
}
# Iterate over all filenames
my #files = glob('*');
foreach my $file (#files){
my $new = $file;
# Iterate over replacements
foreach my $string (#strings){
$new =~ s/$string//;
}
# Check if name would change
if($new ne $file){
if( -f $new){
printf("Cowardly refusing to rename %s as %s since it involves overwriting\n",$file,$new);
} else {
printf("Rename %s as %s\n",$file,$new);
# rename $file,$new;
}
}
}
Then save that in your HOME directory as renamer. Make it executable - only necessary once - with this command in Terminal:
chmod +x $HOME/renamer
Then you can go in any directory where you madly named files are and run the script like this:
cd path/to/mad/files
$HOME/renamer
As with all things you download off the Internet, make a backup first and just run on a small, copied, subset of your files till you get the idea of how it works.

If you use homebrew as your package manager, you could install rename using:
brew install rename
You could then take all the Perl from my other answer and condense it down to a couple of lines and embed it in a rename command which would give you the added benefit of being able to do dry-runs etc. The code below does exactly the same as my other answer but is somewhat harder to read for non_perl folk.
Your command would simply be:
rename --dry-run '
my #strings = map { s/\r|\n//g; $_=quotemeta($_) } `cat remove.txt`;
foreach my $string (#strings){ s/$string//; } ' *
Sample Output
'ilikecoffee(Ena)M-3_1' would be renamed to 'ilikecoffee'
'ilikecoffee-SOMe.fil' would be renamed to 'ilikecoffee'
'ilikecoffee.So[Me].filEna)M-3_2' would be renamed to 'ilikecoffee'
To try and understand it, remember:
the rename part applies the following Perl to each file because of the asterisk at the end
the #strings part reads all the strings from the file remove.txt and removes any carriage returns and linefeeds from them and quotes any metacharacters
the foreach applies each of the deletions to the current filename which rename stores in $_ for you
Note that this method trades simplicity for performance somewhat. If you have millions of files to do, the other method will be quicker because here I read the remove.txt file for each and every file whose name is checked, but if you only have a few hundred/thousand files, I doubt you'll notice it.
This should be much the same, just shorter:
rename --dry-run '
my #strings = `cat remove.txt`; chomp #strings;
foreach my $string (#strings){ s/\Q$string\E//; } ' *

How to store a directory's contents inside of an array

I want to be able to store a directory's contents inside of an array. I know that you can use:
#!/bin/bash
declare -A array
for i in Directory; do
array[$i]=$i
done
to store directory contents in an associative array. But if there are subdirectories inside of a directory, I want to be able to store them and their contents inside of the same array. I also tried using:
declare -A arr1
find Directory -print0 | while read -d $'\0' file; do
arr1[$file]=$file
echo "${arr1[$file]}"
done
but this just runs into the problem where the array contents vanish once the while loop ends due to the subshell being discarded from the pipeline (not sure if I'm describing this correctly).
I even tried the following:
for i in $(find Directory/*); do
arr2[$i]="$i"
echo $i
done
but the output is a total disaster for files containing any spaces.
How can I store both a directory and all of its subdirectories (and their subdirectories if need be) inside of a single array?

So you know, you don't need associative arrays. A simpler way to add an element to a regular indexed array is:
array+=("$value")
Your find|while approach is on the right track. As you've surmised, you need to get rid of the pipeline. You can do that with process substitution:
while read -d $'\0' file; do
arr1+=("$file")
done < <(find Directory -print0)
Another way to do this without find is with globbing. If you just want the files under Directory it's as simple as:
array=(Directory/*)
If you want to recurse through all of its subdirectories as well, you can enable globstar and use **:
shopt -s globstar
array=(Directory/**)
The globbing methods are really nice because they automatically handle file names with whitespace and other special characters.

Unique file names in a directory in unix

I have a capture file in a directory in which some logs are being written in a file
word.cap
now there is a script in which when its size becomes exactly 1.6Gb then it clears itself and prepares files in below format in same directory-
word.cap.COB2T_1389889231
word.cap.COB2T_1389958275
word.cap.COB2T_1390035286
word.cap.COB2T_1390132825
word.cap.COB2T_1390213719
Now i want to pick all these files in a script one by one and want to perform some actions.
my script is-
today=`date +%d_%m_%y`
grep -E '^IPaddress|^Node' /var/rawcap/word.cap.COB2T* | awk '{print $3}' >> snmp$today.txt
sort -u snmp$today.txt > snmp_final_$today.txt
so, what should i write to pick all file names of above mentioned format one by one as i will place this script in crontab,but i don't want to read main word.cap file as that is being edited.

As per your comment:
Thanks, this is working but i have a small issue in this. There are
some files which are bzipped i.e. word.cap.COB2T_1390213719.bz2, so i
dont want these files in list, so what should be done?
You could add a condition inside the loop:
for file in word.cap.COB2T*; do
if [[ "$file" != *.bz2 ]]; then
# Do something here
echo ${file};
fi
done

How do i make multiple copies of a single file with different filenames?

I want to create a windows batch file (Win7) to achieve the following:
Copy source.doc to destination with destinationFilename.doc taken from a list in a text file (nameList.txt)
I have batch file that will make directories from nameList.txt but I can't figure out how to modify the batch file to make it copy source.doc in the required manner.

Using xargs you can process a file nameList.txt which contains a newline separated list of target filenames like this:
cat nameList.txt | xargs -I "F" cp source.doc F
where -I "F" defines F as a placeholder to be used in command invocation of cp.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How can I remove lines from .txt files using a loop? - file

Easy cheasy: sed -i "s/^.string.//" *.txt this will remove any line containing 'string' on each .txt file

You use it along with read to get each filename in turn, after piping the results of find to it. Then you just pass the filename to sed to delete the lines you're interested in.

Related

Make a list of all files in two folders then iterate through the combined list randomly

Loop thru a filename list and iterate thru a variable/array removing all strings from filenames with bash

How to store a directory's contents inside of an array

Unique file names in a directory in unix

How do i make multiple copies of a single file with different filenames?

Categories

Resources

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How can I remove lines from .txt files using a loop? - file

Easy cheasy: sed -i "s/^.*string.*//" *.txt this will remove any line containing 'string' on each .txt file

You use it along with read to get each filename in turn, after piping the results of find to it. Then you just pass the filename to sed to delete the lines you're interested in.

Related

Make a list of all files in two folders then iterate through the combined list randomly

Loop thru a filename list and iterate thru a variable/array removing all strings from filenames with bash

How to store a directory's contents inside of an array

Unique file names in a directory in unix

How do i make multiple copies of a single file with different filenames?

Categories

Resources

Easy cheasy: sed -i "s/^.string.//" *.txt this will remove any line containing 'string' on each .txt file