Batch script that replaces static string in file with filename - batch-file

I have 3000 files in c:\data\, and I need to replace a static string in each of them with the name of the file. For example, in the file 12345678.txt there will be some records along with the string 99999999, and I want to replace 99999999 with the filename 12345678.
How can I do this using a batch script?

try this,
replace_string="99999999"
for f in *.txt; do
sed -i "s/${replace_string}/${f%.*}/g" "$f";
done
Explanation:
for f in *.txt; do ... done: Loop through files named *.txt in current directory.
sed -i ... file Edit file in place (-i).
"s/pattern/replacement/g" Substitutes (s) pattern with replacement globally (g).
${f%.*} Filename without extension (via)

With GNU tools:
find . -regex '.*/[0-9]+\.txt' -type f -exec gawk -i inplace '
BEGINFILE {f = FILENAME; sub(".*/", "", f); sub(/\..*/, "", f)}
{gsub(/\<99999999\>/, f); print}' {} +

Related

bash: awk all files that may have spaces under a directory, treating them as the same file

Trying to find all the files with a specific naming structure under a directory and all subdirectories and use awk to parse out the data that I want. I was able to get it as long as there were no spaces in the names of folders or the files.
I first use find to find the files and put them in an array. Then I use the array as the filename for awk. But the array treats any spaces as a different element so it splits up /Documents/Untitled Folder/file.txt into /Documents/Untitled and Folder/file.txt
Any way to push files that may also contain spaces? This is what I have so far which works if there are no files/directories/subdirectories without spaces.
arrFindFiles=($(find . -name "f*.txt" | sed 's/\ /\\\ /g'))
arrData+=("$(awk -F , '{if($9$10!=NULL) a[$9$10$13]++ } END { for (b in a) { print b } }' ${arrFindFiles[#]})")
Any help would be greatly appreciated!
Let me guess.
Generate a temp file which list all txt files in current folder.
$ find . -type f -name "f*.txt" > temp.txt
$ cat temp.txt
./b/f ab.txt
./b/fa.txt
./f a b.txt
./fab.txt
then run awk command to find out the duplicate names.
awk -F \/ '{a=$0;b=$NF;gsub(/ /,"",$NF);c[$NF]=c[$NF]==""?a:c[$NF] OFS a;d[$NF]++}
END{for (i in d) if (d[i]>1) print "found duplicate name: \n" c[i]}' OFS=" | " temp.txt
found duplicate name:
./b/f ab.txt | ./f a b.txt | ./fab.txt
For your first line to work with files with spaces, just put an eval in front:
eval arrFindFiles=($(find . -name "f*.txt" | sed 's/\ /\\\ /g'))
For your second line to work, put the double quotes only around ${arrFindFiles[#]}:
arrData+=($(awk -F, '{ if ($9$10!=NULL) a[$9$10$13]++ } END { for (b in a) { print b } }' "${arrFindFiles[#]}"))

Run on all files in a directory, treating each separately, using AWK

I want to run an AWK script on each file in a directory, but the AWK command must run only within that file - that is, it searches between a defined RS and returns what it is supposed to, but I need it to stop when it reaches the end of one file, and begin again at the start of the next file.
I have this running which works on each file individually:
awk '!/value1|value2/ && NF {print FILENAME " - " RS,$1}' RS="value" file1 > result.txt
But the output isn't correct, say, when applying it to each file in a directory using
find . -type f | awk ... file1 > result.txt
How would I get it to look at each file individually, but append the results into the same file? I have no idea unfortunately. I guess it's by adding each file into a variable and having AWK look at that, but I am not sure how to do it.
File file1:
interface Vlan3990
ip address 172.17.226.23 255.255.255.254
File file2:
version 12.2
ip tacacs source-interface Loopback99
ip vrf test
description xx
interface Loopback99
description Management Loopback
interface Loopback100
shutdown
Output
find . -type f | xargs awk '!/description|shutdown/ && NF {print FILENAME " - " RS,$1}' RS="interface" | more
./file1 - interface Vlan3990
./file2 - interface version
I am not sure where the output 'interface version' is coming from...
For just the current directory:
for file in *
do awk ... "$file"
done > result.txt
If you need to recurse into subdirectories:
find . -type f -exec awk ... {} ; > result.txt
In both cases, you should probably put result.txt in a different directory. Otherwise, it will be matched and used as an input file. Or use a wildcard that only matches the desired files, and doesn't match result.txt.
Try this, given your 2 posted input files:
$ gawk '
function prtName() {
if (name)
print FILENAME, name
name=""
}
/^interface/ { prtName(); name=$0 }
/^ (description|shutdown)/ { name="" }
ENDFILE { prtName() }
' file1 file2
file1 interface Vlan3990
Is that what you're looking for? Note that it's gawk-specific (courtesy of ENDFILE) which I assume is fine since the sample command you posted is also gawk-specific but if that's an issue just add a test for FNR==1 and change ENDFILE to END:
$ awk '
function prtName() {
if (name)
print filename, name
name=""
}
FNR==1 { prtName(); filename=FILENAME }
/^interface/ { prtName(); name=$0 }
/^ (description|shutdown)/ { name="" }
END { prtName() }
' file1 file2
file1 interface Vlan3990

Truncate NUL bytes off a file

I have about 500 files with trailing NUL bytes, maybe produced with
truncate -s 8M <file>
How can I cut off the zeroes?
This perl script should do it:
for f in *; do
perl -e '$/=undef;$_=<>;s|\0+$||;print;' < $f > $f_fixed
done
This will keep all NULs within the file, remove any at the end, and save the result into <original filename>_fixed.
Script explanation: $/=undef tells perl to operate on the whole file rather than splitting it into lines; $_=<> loads the file; s|\0+|| removes any string of NULs at the end of the loaded file 'string'; and print outputs the result. The rest is standard Bash file redirection.
If the file is a "text" file and not a "binary" file, you can simply do
strings a.txt > b.txt
ref
Use tr:
cat $input_file | tr -d '\0' > $output_file
Note that $input_file and $output_file must be different
Following the suggestion of #Eevee, you can actually avoid truncating those files below 8M. Using the following condition in your loop and the fact that truncate will assume bytes as default if you don't append any suffix to the size parameter, this won't pad the files below 8M:
for file in $(ls -c1 directory); do
# ...
SIZE=$(stat -c%s $file)
LIMIT=$((8 * 1024 * 1024))
if [ "$SIZE" -lt "$LIMIT" ]; then
truncate -s $SIZE $file
else
truncate -s 8M $file
fi
# ...
done
Not really any Unix tool for this particular case. Here's a Python (3) script:
import sys
for fn in sys.argv[1:]:
with open(fn, 'rb') as f:
contents = f.read()
with open(fn, 'wb') as f:
f.write(contents.rstrip(b'\0'))
Run as:
python retruncate.py file1 file2 files* etc...

First line of every file in a new file

How can I get the first line of EVERY file in a directory and save them all in a new file?
#!/bin/bash
rm FIRSTLINE
for file in "$(find $1 -type f)";
do
head -1 $file >> FIRSTLINE
done
cat FIRSTLINE
This is my bash script, but when I do this and I open the file FIRSTLINE,
then I see this:
==> 'path of the file' <==
'first line' of the file
and this for all the files in my argument.
Does anybody has some solution?
find . -type f -exec head -1 \{\} \; > YOURFILE
might work for you.
The problem is that you've quoted the output of find so it gets treated as a single string, so the for loop only runs once, with a single argument containing all the files. That means you run head -1 file1 file2 file3 file4 ... etc. and when given multiple files head prints the ==> file1 <== headers.
So to fix it, remove the double quotes around the find shell-out, which ensures you run the for loop once for each file, as intended. Also, the semi-colon after the shell-out is unnecessary.
#!/bin/bash
rm FIRSTLINE
for file in $(find $1 -type f)
do
head -1 $file >> FIRSTLINE
done
cat FIRSTLINE
This has some style issues though, do you really need to write to a file then cat the file to stdout? You could just print the output to stdout:
#!/bin/bash
for file in $(find $1 -type f)
do
head -1 $file
done
Personally I'd write it like this:
find $1 -type f | xargs -L1 head -1
or if you need the output in the file and printed to stdout:
find $1 -type f | xargs -L1 head -1 | tee FIRSTLINE
$ for file in $(find $1 -type f); do echo '';
echo $file;
head -n 4 $file;
done
for gzip files fo instances:
for file in `ls *.gz`; do gzcat $file | head -n 1; done > toto.txt

Batch rename sequential files by padding with zeroes

I have a bunch of files named like so:
output_1.png
output_2.png
...
output_10.png
...
output_120.png
What is the easiest way of renaming those to match a convention, e.g. with maximum four decimals, so that the files are named:
output_0001.png
output_0002.png
...
output_0010.png
output_0120.png
This should be easy in Unix/Linux/BSD, although I also have access to Windows. Any language is fine, but I'm interested in some really neat one-liners (if there are any?).
Python
import os
path = '/path/to/files/'
for filename in os.listdir(path):
prefix, num = filename[:-4].split('_')
num = num.zfill(4)
new_filename = prefix + "_" + num + ".png"
os.rename(os.path.join(path, filename), os.path.join(path, new_filename))
you could compile a list of valid filenames assuming that all files that start with "output_" and end with ".png" are valid files:
l = [(x, "output" + x[7:-4].zfill(4) + ".png") for x in os.listdir(path) if x.startswith("output_") and x.endswith(".png")]
for oldname, newname in l:
os.rename(os.path.join(path,oldname), os.path.join(path,newname))
Bash
(from: http://www.walkingrandomly.com/?p=2850)
In other words I replace file1.png with file001.png and file20.png with file020.png and so on. Here’s how to do that in bash
#!/bin/bash
num=`expr match "$1" '[^0-9]*\([0-9]\+\).*'`
paddednum=`printf "%03d" $num`
echo ${1/$num/$paddednum}
Save the above to a file called zeropad.sh and then do the following command to make it executable
chmod +x ./zeropad.sh
You can then use the zeropad.sh script as follows
./zeropad.sh frame1.png
which will return the result
frame001.png
All that remains is to use this script to rename all of the .png files in the current directory such that they are zeropadded.
for i in *.png;do mv $i `./zeropad.sh $i`; done
Perl
(from: Zero pad rename e.g. Image (2).jpg -> Image (002).jpg)
use strict;
use warnings;
use File::Find;
sub pad_left {
my $num = shift;
if ($num < 10) {
$num = "00$num";
}
elsif ($num < 100) {
$num = "0$num";
}
return $num;
}
sub new_name {
if (/\.jpg$/) {
my $name = $File::Find::name;
my $new_name;
($new_name = $name) =~ s/^(.+\/[\w ]+\()(\d+)\)/$1 . &pad_left($2) .')'/e;
rename($name, $new_name);
print "$name --> $new_name\n";
}
}
chomp(my $localdir = `pwd`);# invoke the script in the parent-directory of the
# image-containing sub-directories
find(\&new_name, $localdir);
Rename
Also from above answer:
rename 's/\d+/sprintf("%04d",$&)/e' *.png
Fairly easy, although it combines a few features not immediately obvious:
#echo off
setlocal enableextensions enabledelayedexpansion
rem iterate over all PNG files:
for %%f in (*.png) do (
rem store file name without extension
set FileName=%%~nf
rem strip the "output_"
set FileName=!FileName:output_=!
rem Add leading zeroes:
set FileName=000!FileName!
rem Trim to only four digits, from the end
set FileName=!FileName:~-4!
rem Add "output_" and extension again
set FileName=output_!FileName!%%~xf
rem Rename the file
rename "%%f" "!FileName!"
)
Edit: Misread that you're not after a batch file but any solution in any language. Sorry for that. To make up for it, a PowerShell one-liner:
gci *.png|%{rni $_ ('output_{0:0000}.png' -f +($_.basename-split'_')[1])}
Stick a ?{$_.basename-match'_\d+'} in there if you have other files that do not follow that pattern.
I actually just needed to do this on OSX. Here's the scripts I created for it - single line!
> for i in output_*.png;do mv $i `printf output_%04d.png $(echo $i | sed 's/[^0-9]*//g')`; done
For mass renaming the only safe solution is mmv—it checks for collisions and allows renaming in chains and cycles, something that is beyond most scripts. Unfortunately, zero padding it ain't too hot at. A flavour:
c:> mmv output_[0-9].png output_000#1.png
Here's one workaround:
c:> type file
mmv
[^0-9][0-9] #1\00#2
[^0-9][0-9][^0-9] #1\00#2#3
[^0-9][0-9][0-9] #1\0#2#3
[^0-9][0-9][0-9][^0-9] #1\0#2#3
c:> mmv <file
Here is a Python script I wrote that pads zeroes depending on the largest number present and ignores non-numbered files in the given directory. Usage:
python ensure_zero_padding_in_numbering_of_files.py /path/to/directory
Body of script:
import argparse
import os
import re
import sys
def main(cmdline):
parser = argparse.ArgumentParser(
description='Ensure zero padding in numbering of files.')
parser.add_argument('path', type=str,
help='path to the directory containing the files')
args = parser.parse_args()
path = args.path
numbered = re.compile(r'(.*?)(\d+)\.(.*)')
numbered_fnames = [fname for fname in os.listdir(path)
if numbered.search(fname)]
max_digits = max(len(numbered.search(fname).group(2))
for fname in numbered_fnames)
for fname in numbered_fnames:
_, prefix, num, ext, _ = numbered.split(fname, maxsplit=1)
num = num.zfill(max_digits)
new_fname = "{}{}.{}".format(prefix, num, ext)
if fname != new_fname:
os.rename(os.path.join(path, fname), os.path.join(path, new_fname))
print "Renamed {} to {}".format(fname, new_fname)
else:
print "{} seems fine".format(fname)
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
$rename output_ output_0 output_? # adding 1 zero to names ended in 1 digit
$rename output_ output_0 output_?? # adding 1 zero to names ended in 2 digits
$rename output_ output_0 output_??? # adding 1 zero to names ended in 3 digits
That's it!
with bash split,
linux
for f in *.png;do n=${f#*_};n=${n%.*};mv $f $(printf output_"%04d".png $n);done
windows(bash)
for f in *.png;do n=${f#*_};mv $f $(printf output_"%08s" $n);done
I'm following on from Adam's solution for OSX.
Some gotchyas I encountered in my scenario were:
I had a set of .mp3 files, so the sed was catching the '3' in the '.mp3' suffix. (I used basename instead of echo to rectify this)
My .mp3's had spaces within their names, E.g., "audio track 1.mp3", this was causing basename+sed to screw up a little bit, so I had to quote the "$i" parameter.
In the end, my conversion line looked like this:
for i in *.mp3 ; do mv "$i" `printf "track_%02d.mp3\n" $(basename "$i" .mp3 | sed 's/[^0-9]*//g')` ; done
Using ls + awk + sh:
ls -1 | awk -F_ '{printf "%s%04d.png\n", "mv "$0" "$1"_", $2}' | sh
If you want to test the command before runing it just remove the | sh
I just want to make time lapse movie using
ffmpeg -pattern_type glob -i "*.jpg" -s:v 1920x1080 -c:v libx264 output.mp4
and got a similar problem.
[image2 # 000000000039c300] Pattern type 'glob' was selected but globbing is not supported by this libavformat build
glob not support on Windows 7 .
Also if file list like below, and uses %2d.jpg or %02d.jpg
1.jpg
2.jpg
...
10.jpg
11.jpg
...
[image2 # 00000000005ea9c0] Could find no file with path '%2d.jpg' and index in the range 0-4
%2d.jpg: No such file or directory
[image2 # 00000000005aa980] Could find no file with path '%02d.jpg' and index in the range 0-4
%02d.jpg: No such file or directory
here is my batch script to rename flies
#echo off
setlocal enabledelayedexpansion
set i=1000000
set X=1
for %%a in (*.jpg) do (
set /a i+=1
set "filename=!i:~%X%!"
echo ren "%%a" "!filename!%%~xa"
ren "%%a" "!filename!%%~xa"
)
after rename 143,323 jpg files,
ffmpeg -i %6d.jpg -s:v 1920x1080 -c:v libx264 output.mp4

Resources