How to exclude rows from a text file for loop - loops

I have a textfile (file1.txt) with multiple lines of data.
This textfile I'm using to copy data from a directory A to another B. My script looks if an expression is included in a filename of a file stored in A.
In the directory A I grep another textfile (file2.txt) to get information (rows like [bla][0-9][0-9][bla][0-9][0-9]) that I want to exclude in my script.
set x = `grep '[bla][0-9][0-9][bla][0-9][0-9]' file1.txt`
foreach i ( $x )
cp A/*$i* B/.
end
For example rows in file1.txt:
bla11bla11
bla12bla12
bla13bla13
bla14bla14
bla15bla15
and grep result from file2.txt that has to be excluded for the loop
bla11bla11
bla12bla12
My script should finally only use the following lines
bla13bla13
bla14bla14
bla15bla15
How can I do this?

A simple nested loop does it completely in csh:
#!/bin/csh -f
set f = `grep 'bla[0-9][0-9]bla[0-9][0-9]' file1.txt`
set x = `grep 'bla[0-9][0-9]bla[0-9][0-9]' file2.txt`
echo "Files: $f"
echo "To be excluded from Files: $x"
set r = ( )
foreach i ( $f )
set skip = 0
foreach j ( $x )
if ("$i" == "$j") then
set skip = 1
break
endif
end
if ($skip == 0) set r = ($r $i)
end
echo "Result: $r"
The output when run on your above example files:
Files: bla11bla11 bla12bla12 bla13bla13 bla14bla14 bla15bla15
To be excluded from Files: bla11bla11 bla12bla12
Result: bla13bla13 bla14bla14 bla15bla15

Related

Looking for MORE/MOVE solutions that can handle files with more than 65534 rows

I have numerous uniquely named .CSV files that I need to remove the first 17 lines from. Some of these files exceed 65534 rows so my MORE/MOVE Batch script is not working. Looking for alternative solutions.
#echo off
for %%a in (*.csv) do (
more +17 "%%a" >"%%a.new"
move /y "%%a.new" "%%a" >nul
)
Regardless of number of rows input I am looking to have the 17 header rows removed and new file with all remaining rows built.
Here's a powershell option; this one uses a stream to cater for your large files:
$csvs = Get-ChildItem -Path "P:\ath to\your csvs" -Filter *.csv
foreach ( $csv in $csvs ) {
$fin = New-Object System.IO.StreamReader( $csv.FullName )
$fout = New-Object System.IO.StreamWriter( $csv.FullName+".new" )
try {
for( $s = 1; $s -le 17 -and !$fin.EndOfStream; $s++ ) {
$fin.ReadLine()
}
while( !$fin.EndOfStream ) {
$fout.WriteLine( $fin.ReadLine() )
}
}
finally {
$fout.Close()
$fin.Close()
}
}
Just change the path to your .csvs on the first line, before testing it.
I have purposely left out the deletion of the original files, simply appending .new to the new filenames to allow you time to check the results, test the speed etc. I will leave it to you to include a Rename/Delete or Move should you feel the need to extend the functionality.
Here's a one-line solution
for %%a in (*.txt) do powershell -Com "sc -Path '%%a' -Value (gc '%%a' | select -Skip 17)"
where gc and sc are default aliases for Get-Content and Set-Content respectively. See also
Powershell select-object skip multiple lines?
Powershell skip first 2 lines of txt file when importing it
If your files are huge then it'll be better to read in lines or blocks which can also be implemented easily using file functions, [IO.File]::OpenText or the -ReadCount option of Get-Content in PowerShell
Reading large text files with Powershell
Reading very BIG text files using PowerShell
How to process a file in PowerShell line-by-line as a stream
How can I make this PowerShell script parse large files faster?
As Squashman mentioned, for /f also has an option to skip lines at the beginning of the file
for %%a in (*.csv) do (
for /f "usebackq skip=17 delims=" %%l in ("%%f") do #echo(%%l>>"%%a.new"
move /y "%%a.new" "%%a" >nul
)
But that won't work if your file contains lines with special characters like & or |. For more information about it run for /?
Make your own cut command. This is VBScript ported to VB.NET.
Cut
cut {t|b} {i|x} NumOfLines
Cuts the number of lines from the top or bottom of file.
t - top of the file
b - bottom of the file
i - include n lines
x - exclude n lines
Example
cut t i 5 < "%systemroot%\win.ini"
Cut.bat
REM Cut.bat
REM This file compiles Cut.vb to Cut.exe
REM Cut.exe Removes specified from top or bottom of lines from StdIn and writes to StdOut
REM To use
REM cut {t|b} {i|x} NumOfLines
Rem Cuts the number of lines from the top or bottom of file.
Rem t - top of the file
Rem b - bottom of the file
Rem i - include n lines
Rem x - exclude n lines
Rem
Rem Example - Includes first 5 lines Win.ini
Rem
Rem cut t i 5 < "%systemroot%\win.ini"
"C:\Windows\Microsoft.NET\Framework\v4.0.30319\vbc.exe" /target:exe /out:"%~dp0\Cut.exe" "%~dp0\Cut.vb" /verbose
pause
Cut.vb
'DeDup.vb
Imports System
Imports System.IO
Imports System.Runtime.InteropServices
Imports Microsoft.Win32
Public Module DeDup
Sub Main
Dim Arg() As Object
Dim RS as Object
Dim LineCount as Object
Dim Line as Object
Arg = Split(Command(), " ")
rs = CreateObject("ADODB.Recordset")
With rs
.Fields.Append("LineNumber", 4)
.Fields.Append("Txt", 201, 5000)
.Open
LineCount = 0
Line=Console.readline
Do Until Line = Nothing
LineCount = LineCount + 1
.AddNew
.Fields("LineNumber").value = LineCount
.Fields("Txt").value = Console.readline
.UpDate
Line = Console.ReadLine
Loop
.Sort = "LineNumber ASC"
If LCase(Arg(0)) = "t" then
If LCase(Arg(1)) = "i" then
.filter = "LineNumber < " & LCase(Arg(2)) + 1
ElseIf LCase(Arg(1)) = "x" then
.filter = "LineNumber > " & LCase(Arg(2))
End If
ElseIf LCase(Arg(0)) = "b" then
If LCase(Arg(1)) = "i" then
.filter = "LineNumber > " & LineCount - LCase(Arg(2))
ElseIf LCase(Arg(1)) = "x" then
.filter = "LineNumber < " & LineCount - LCase(Arg(2)) + 1
End If
End If
Do While not .EOF
Console.writeline(.Fields("Txt").Value)
.MoveNext
Loop
End With
End Sub
End Module

Multidimensional array in bash via associative array

I'm trying to load files from the directory to the associative array with the access like "FDN,4" where FND is the basename of the file and 4 - is the line number:
loadFiles() {
local iter
local comname
local lines
echo "# Loading files"
find ./sys -type f | while read iter
do
comname=$(basename "$iter" .aic)
echo "# $comname"
local i
i=0
while IFS= read -r line
do
commands["$comname,$i"]="$line"
#echo "$comname,$i = ${commands[$comname,$i]}"
((i++))
done < "$iter"
[[ -n $line ]] && commands["$comname,$i"]="$line"
done
}
loadFiles
echo "POP,4 = ${commands[POP,4]}"
I'm getting nothing, the ./sys/dir/POP.aic file exists and the 4th line in this file too. Commented echo inside the cycle shows that value assigns.
Can anyone, please, help and show me where I'm wrong?
Found the root of evil - the subshell. echo "1 2 3" | while <...> will submit the nex subshell, so the variables will be set only locally. The soultion is to use while <...> done < <(find ./sys -type f)

Creating Arrays From Unknown Number of External Lists

this is another question related to importing values from a text file (similar to one of my previous ones), but with added complexity (the more I learn about bash scripting the more challenging it becomes)
The goal: to create an array of Day_.... on each outer loop iteration. I'm trying to do this assuming no knowledge of how many Day_... lists exist in the *.txt file.
The issue: At the moment my inner loop only iterates once (should iterate as the number of elements on Monday. And, also, I'm using my_sub_dom=$( sed 's/=.*//' weekly.txt ) to get the number of lists/arrays in weekly.txt and then filter the ones that contain Day.
Bash script:
#!/bin/bash
source weekly.txt
declare -a my_sub_dom
day=( ${Monday[*]} )
my_sub_dom=$( sed 's/=.*//' weekly.txt ) # to construct a list of the number of of lists in the text file
#echo "${my_sub_dom}"
counter=0
main_counter=0
for i in "${day[#]}"
do
let main_counter=main_counter+1
for j in "${my_sub_dom[#]}"
do
# echo "$j"
if grep -q "Day" "${my_sub_dom}"
then
echo "$j"
sub_array_name="${my_sub_dom[counter]}" # storing the list name
sub_array_content=( ${sub_array_name[*]} )
echo "${sub_array_content}"
else
echo "no"
fi
let counter=counter+1
done
echo "$counter"
counter=0
done
echo "$main_counter"
Text file format:
Day_Mon=( "google" "yahoo" "amazon" )
Day_Tu=( "cnn" "msnbc" "google" )
Day_Wed=( "nytimes" "fidelity" "stackoverflow" )
Monday= ( "one" "two" "three" )
....
Script output:
grep: Day_Mon
Day_Tu
Day_Wed
Monday: No such file or directory
no
1
grep: Day_Mon
Day_Tu
Day_Wed
Monday: No such file or directory
no
1
grep: Day_Mon
Day_Tu
Day_Wed
Monday: No such file or directory
no
1
3
Please let me know if you'd like any other information.... And I really appreciate any input in this matter, I've been trying this for a couple of days now.
Thank you
Given a file weekly.txt containing
Day_Mon=( "google" "yahoo" "amazon" )
Day_Tu=( "cnn" "msnbc" "google" )
Day_Wed=( "nytimes" "fidelity" "stackoverflow" )
Monday=( "one" "two" "three" )
You can loop through each array named Day_-something with
#!/bin/bash
# Set all arrays defined in the file
source weekly.txt
# Get all variables prefixed with "Day_" (bash 4)
for name in "${!Day_#}"
do
echo "The contents of array $name is: "
# Use indirection to expand the array
arrayexpansion="$name[#]"
for value in "${!arrayexpansion}"
do
echo "-- $value"
done
echo
done
This results in:
The contents of array Day_Mon is:
-- google
-- yahoo
-- amazon
The contents of array Day_Tu is:
-- cnn
-- msnbc
-- google
The contents of array Day_Wed is:
-- nytimes
-- fidelity
-- stackoverflow

populate and read an array with a list of filenames

Trivial question.
#!/bin/bash
if test -z "$1"
then
echo "No args!"
exit
fi
for newname in $(cat $1); do
echo $newname
done
I want to replace that echo inside the loop with array population code.
Then, after the loop ends, I want to read the array again and echo the contents.
Thanks.
If the file, as your code shows, has a set of files, each in one line, you can assign the value to the array as follows:
array=(`cat $1`)
After that, to process every element you can do something like:
for i in ${array[#]} ; do echo "file = $i" ; done
declare -a files
while IFS= read -r
do
files+=("$REPLY") # Array append
done < "$1"
echo "${files[*]}" # Print entire array separated by spaces
cat is not needed for this.
#!/bin/bash
files=( )
for f in $(cat $1); do
files[${#files[*]}]=$f
done
for f in ${files[#]}; do
echo "file = $f"
done

Batch rename sequential files by padding with zeroes

I have a bunch of files named like so:
output_1.png
output_2.png
...
output_10.png
...
output_120.png
What is the easiest way of renaming those to match a convention, e.g. with maximum four decimals, so that the files are named:
output_0001.png
output_0002.png
...
output_0010.png
output_0120.png
This should be easy in Unix/Linux/BSD, although I also have access to Windows. Any language is fine, but I'm interested in some really neat one-liners (if there are any?).
Python
import os
path = '/path/to/files/'
for filename in os.listdir(path):
prefix, num = filename[:-4].split('_')
num = num.zfill(4)
new_filename = prefix + "_" + num + ".png"
os.rename(os.path.join(path, filename), os.path.join(path, new_filename))
you could compile a list of valid filenames assuming that all files that start with "output_" and end with ".png" are valid files:
l = [(x, "output" + x[7:-4].zfill(4) + ".png") for x in os.listdir(path) if x.startswith("output_") and x.endswith(".png")]
for oldname, newname in l:
os.rename(os.path.join(path,oldname), os.path.join(path,newname))
Bash
(from: http://www.walkingrandomly.com/?p=2850)
In other words I replace file1.png with file001.png and file20.png with file020.png and so on. Here’s how to do that in bash
#!/bin/bash
num=`expr match "$1" '[^0-9]*\([0-9]\+\).*'`
paddednum=`printf "%03d" $num`
echo ${1/$num/$paddednum}
Save the above to a file called zeropad.sh and then do the following command to make it executable
chmod +x ./zeropad.sh
You can then use the zeropad.sh script as follows
./zeropad.sh frame1.png
which will return the result
frame001.png
All that remains is to use this script to rename all of the .png files in the current directory such that they are zeropadded.
for i in *.png;do mv $i `./zeropad.sh $i`; done
Perl
(from: Zero pad rename e.g. Image (2).jpg -> Image (002).jpg)
use strict;
use warnings;
use File::Find;
sub pad_left {
my $num = shift;
if ($num < 10) {
$num = "00$num";
}
elsif ($num < 100) {
$num = "0$num";
}
return $num;
}
sub new_name {
if (/\.jpg$/) {
my $name = $File::Find::name;
my $new_name;
($new_name = $name) =~ s/^(.+\/[\w ]+\()(\d+)\)/$1 . &pad_left($2) .')'/e;
rename($name, $new_name);
print "$name --> $new_name\n";
}
}
chomp(my $localdir = `pwd`);# invoke the script in the parent-directory of the
# image-containing sub-directories
find(\&new_name, $localdir);
Rename
Also from above answer:
rename 's/\d+/sprintf("%04d",$&)/e' *.png
Fairly easy, although it combines a few features not immediately obvious:
#echo off
setlocal enableextensions enabledelayedexpansion
rem iterate over all PNG files:
for %%f in (*.png) do (
rem store file name without extension
set FileName=%%~nf
rem strip the "output_"
set FileName=!FileName:output_=!
rem Add leading zeroes:
set FileName=000!FileName!
rem Trim to only four digits, from the end
set FileName=!FileName:~-4!
rem Add "output_" and extension again
set FileName=output_!FileName!%%~xf
rem Rename the file
rename "%%f" "!FileName!"
)
Edit: Misread that you're not after a batch file but any solution in any language. Sorry for that. To make up for it, a PowerShell one-liner:
gci *.png|%{rni $_ ('output_{0:0000}.png' -f +($_.basename-split'_')[1])}
Stick a ?{$_.basename-match'_\d+'} in there if you have other files that do not follow that pattern.
I actually just needed to do this on OSX. Here's the scripts I created for it - single line!
> for i in output_*.png;do mv $i `printf output_%04d.png $(echo $i | sed 's/[^0-9]*//g')`; done
For mass renaming the only safe solution is mmv—it checks for collisions and allows renaming in chains and cycles, something that is beyond most scripts. Unfortunately, zero padding it ain't too hot at. A flavour:
c:> mmv output_[0-9].png output_000#1.png
Here's one workaround:
c:> type file
mmv
[^0-9][0-9] #1\00#2
[^0-9][0-9][^0-9] #1\00#2#3
[^0-9][0-9][0-9] #1\0#2#3
[^0-9][0-9][0-9][^0-9] #1\0#2#3
c:> mmv <file
Here is a Python script I wrote that pads zeroes depending on the largest number present and ignores non-numbered files in the given directory. Usage:
python ensure_zero_padding_in_numbering_of_files.py /path/to/directory
Body of script:
import argparse
import os
import re
import sys
def main(cmdline):
parser = argparse.ArgumentParser(
description='Ensure zero padding in numbering of files.')
parser.add_argument('path', type=str,
help='path to the directory containing the files')
args = parser.parse_args()
path = args.path
numbered = re.compile(r'(.*?)(\d+)\.(.*)')
numbered_fnames = [fname for fname in os.listdir(path)
if numbered.search(fname)]
max_digits = max(len(numbered.search(fname).group(2))
for fname in numbered_fnames)
for fname in numbered_fnames:
_, prefix, num, ext, _ = numbered.split(fname, maxsplit=1)
num = num.zfill(max_digits)
new_fname = "{}{}.{}".format(prefix, num, ext)
if fname != new_fname:
os.rename(os.path.join(path, fname), os.path.join(path, new_fname))
print "Renamed {} to {}".format(fname, new_fname)
else:
print "{} seems fine".format(fname)
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
$rename output_ output_0 output_? # adding 1 zero to names ended in 1 digit
$rename output_ output_0 output_?? # adding 1 zero to names ended in 2 digits
$rename output_ output_0 output_??? # adding 1 zero to names ended in 3 digits
That's it!
with bash split,
linux
for f in *.png;do n=${f#*_};n=${n%.*};mv $f $(printf output_"%04d".png $n);done
windows(bash)
for f in *.png;do n=${f#*_};mv $f $(printf output_"%08s" $n);done
I'm following on from Adam's solution for OSX.
Some gotchyas I encountered in my scenario were:
I had a set of .mp3 files, so the sed was catching the '3' in the '.mp3' suffix. (I used basename instead of echo to rectify this)
My .mp3's had spaces within their names, E.g., "audio track 1.mp3", this was causing basename+sed to screw up a little bit, so I had to quote the "$i" parameter.
In the end, my conversion line looked like this:
for i in *.mp3 ; do mv "$i" `printf "track_%02d.mp3\n" $(basename "$i" .mp3 | sed 's/[^0-9]*//g')` ; done
Using ls + awk + sh:
ls -1 | awk -F_ '{printf "%s%04d.png\n", "mv "$0" "$1"_", $2}' | sh
If you want to test the command before runing it just remove the | sh
I just want to make time lapse movie using
ffmpeg -pattern_type glob -i "*.jpg" -s:v 1920x1080 -c:v libx264 output.mp4
and got a similar problem.
[image2 # 000000000039c300] Pattern type 'glob' was selected but globbing is not supported by this libavformat build
glob not support on Windows 7 .
Also if file list like below, and uses %2d.jpg or %02d.jpg
1.jpg
2.jpg
...
10.jpg
11.jpg
...
[image2 # 00000000005ea9c0] Could find no file with path '%2d.jpg' and index in the range 0-4
%2d.jpg: No such file or directory
[image2 # 00000000005aa980] Could find no file with path '%02d.jpg' and index in the range 0-4
%02d.jpg: No such file or directory
here is my batch script to rename flies
#echo off
setlocal enabledelayedexpansion
set i=1000000
set X=1
for %%a in (*.jpg) do (
set /a i+=1
set "filename=!i:~%X%!"
echo ren "%%a" "!filename!%%~xa"
ren "%%a" "!filename!%%~xa"
)
after rename 143,323 jpg files,
ffmpeg -i %6d.jpg -s:v 1920x1080 -c:v libx264 output.mp4

Resources