Rename multiple files based on pattern in Unix - file

There are multiple files in a directory that begin with prefix fgh, for example:
fghfilea
fghfileb
fghfilec
I want to rename all of them to begin with prefix jkl. Is there a single command to do that instead of renaming each file individually?

There are several ways, but using rename will probably be the easiest.
Using one version of rename (Perl's rename):
rename 's/^fgh/jkl/' fgh*
Using another version of rename (same as Judy2K's answer):
rename fgh jkl fgh*
You should check your platform's man page to see which of the above applies.

This is how sed and mv can be used together to do rename:
for f in fgh*; do mv "$f" $(echo "$f" | sed 's/^fgh/jkl/g'); done
As per comment below, if the file names have spaces in them, quotes may need to surround the sub-function that returns the name to move the files to:
for f in fgh*; do mv "$f" "$(echo $f | sed 's/^fgh/jkl/g')"; done

rename might not be in every system. so if you don't have it, use the shell
this example in bash shell
for f in fgh*; do mv "$f" "${f/fgh/xxx}";done

Using mmv:
mmv "fgh*" "jkl#1"

There are many ways to do it (not all of these will work on all unixy systems):
ls | cut -c4- | xargs -I§ mv fgh§ jkl§
The § may be replaced by anything you find convenient. You could do this with find -exec too but that behaves subtly different on many systems, so I usually avoid that
for f in fgh*; do mv "$f" "${f/fgh/jkl}";done
Crude but effective as they say
rename 's/^fgh/jkl/' fgh*
Real pretty, but rename is not present on BSD, which is the most common unix system afaik.
rename fgh jkl fgh*
ls | perl -ne 'chomp; next unless -e; $o = $_; s/fgh/jkl/; next if -e; rename $o, $_';
If you insist on using Perl, but there is no rename on your system, you can use this monster.
Some of those are a bit convoluted and the list is far from complete, but you will find what you want here for pretty much all unix systems.

rename fgh jkl fgh*

Using find, xargs and sed:
find . -name "fgh*" -type f -print0 | xargs -0 -I {} sh -c 'mv "{}" "$(dirname "{}")/`echo $(basename "{}") | sed 's/^fgh/jkl/g'`"'
It's more complex than #nik's solution but it allows to rename files recursively. For instance, the structure,
.
├── fghdir
│   ├── fdhfilea
│   └── fghfilea
├── fghfile\ e
├── fghfilea
├── fghfileb
├── fghfilec
└── other
├── fghfile\ e
├── fghfilea
├── fghfileb
└── fghfilec
would be transformed to this,
.
├── fghdir
│   ├── fdhfilea
│   └── jklfilea
├── jklfile\ e
├── jklfilea
├── jklfileb
├── jklfilec
└── other
├── jklfile\ e
├── jklfilea
├── jklfileb
└── jklfilec
The key to make it work with xargs is to invoke the shell from xargs.

Generic command would be
find /path/to/files -name '<search>*' -exec bash -c 'mv $0 ${0/<search>/<replace>}' {} \;
where <search> and <replace> should be replaced with your source and target respectively.
As a more specific example tailored to your problem (should be run from the same folder where your files are), the above command would look like:
find . -name 'gfh*' -exec bash -c 'mv $0 ${0/gfh/jkl}' {} \;
For a "dry run" add echo before mv, so that you'd see what commands are generated:
find . -name 'gfh*' -exec bash -c 'echo mv $0 ${0/gfh/jkl}' {} \;

To install the Perl rename script:
sudo cpan install File::Rename
There are two renames as mentioned in the comments in Stephan202's answer.
Debian based distros have the Perl rename. Redhat/rpm distros have the C rename.
OS X doesn't have one installed by default (at least in 10.8), neither does Windows/Cygwin.

Here's a way to do it using command-line Groovy:
groovy -e 'new File(".").eachFileMatch(~/fgh.*/) {it.renameTo(it.name.replaceFirst("fgh", "jkl"))}'

On Solaris you can try:
for file in `find ./ -name "*TextForRename*"`; do
mv -f "$file" "${file/TextForRename/NewText}"
done

#!/bin/sh
#replace all files ended witn .f77 to .f90 in a directory
for filename in *.f77
do
#echo $filename
#b= echo $filename | cut -d. -f1
#echo $b
mv "${filename}" "${filename%.f77}.f90"
done

This script worked for me for recursive renaming with directories/file names possibly containing white-spaces:
find . -type f -name "*\;*" | while read fname; do
dirname=`dirname "$fname"`
filename=`basename "$fname"`
newname=`echo "$filename" | sed -e "s/;/ /g"`
mv "${dirname}/$filename" "${dirname}/$newname"
done
Notice the sed expression which in this example replaces all occurrences of ; with space . This should of course be replaced according to the specific needs.

Using StringSolver tools (windows & Linux bash) which process by examples:
filter fghfilea ok fghreport ok notfghfile notok; mv --all --filter fghfilea jklfilea
It first computes a filter based on examples, where the input is the file names and the output (ok and notok, arbitrary strings). If filter had the option --auto or was invoked alone after this command, it would create a folder ok and a folder notok and push files respectively to them.
Then using the filter, the mv command is a semi-automatic move which becomes automatic with the modifier --auto. Using the previous filter thanks to --filter, it finds a mapping from fghfilea to jklfilea and then applies it on all filtered files.
Other one-line solutions
Other equivalent ways of doing the same (each line is equivalent), so you can choose your favorite way of doing it.
filter fghfilea ok fghreport ok notfghfile notok; mv --filter fghfilea jklfilea; mv
filter fghfilea ok fghreport ok notfghfile notok; auto --all --filter fghfilea "mv fghfilea jklfilea"
# Even better, automatically infers the file name
filter fghfilea ok fghreport ok notfghfile notok; auto --all --filter "mv fghfilea jklfilea"
Multi-step solution
To carefully find if the commands are performing well, you can type the following:
filter fghfilea ok
filter fghfileb ok
filter fghfileb notok
and when you are confident that the filter is good, perform the first move:
mv fghfilea jklfilea
If you want to test, and use the previous filter, type:
mv --test --filter
If the transformation is not what you wanted (e.g. even with mv --explain you see that something is wrong), you can type mv --clear to restart moving files, or add more examples mv input1 input2 where input1 and input2 are other examples
When you are confident, just type
mv --filter
and voilà! All the renaming is done using the filter.
DISCLAIMER: I am a co-author of this work made for academic purposes. There might also be a bash-producing feature soon.

It was much easier (on my Mac) to do this in Ruby. Here are 2 examples:
# for your fgh example. renames all files from "fgh..." to "jkl..."
files = Dir['fgh*']
files.each do |f|
f2 = f.gsub('fgh', 'jkl')
system("mv #{f} #{f2}")
end
# renames all files in directory from "021roman.rb" to "021_roman.rb"
files = Dir['*rb'].select {|f| f =~ /^[0-9]{3}[a-zA-Z]+/}
files.each do |f|
f1 = f.clone
f2 = f.insert(3, '_')
system("mv #{f1} #{f2}")
end

Using renamer:
$ renamer --find /^fgh/ --replace jkl * --dry-run
Remove the --dry-run flag once you're happy the output looks correct.

My version of renaming mass files:
for i in *; do
echo "mv $i $i"
done |
sed -e "s#from_pattern#to_pattern#g” > result1.sh
sh result1.sh

Another possible parameter expansion:
for f in fgh*; do mv -- "$f" "jkl${f:3}"; done

I would recommend using my own script, which solves this problem. It also has options to change the encoding of the file names, and to convert combining diacriticals to precomposed characters, a problem I always have when I copy files from my Mac.
#!/usr/bin/perl
# Copyright (c) 2014 André von Kugland
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
$help_msg =
"rename.pl, a script to rename files in batches, using Perl
expressions to transform their names.
Usage:
rename.pl [options] FILE1 [FILE2 ...]
Where options can be:
-v Verbose.
-vv Very verbose.
--apply Really apply modifications.
-e PERLCODE Execute PERLCODE. (e.g. 's/a/b/g')
--from-charset=CS Source charset. (e.g. \"iso-8859-1\")
--to-charset=CS Destination charset. (e.g. \"utf-8\")
--unicode-normalize=NF Unicode normalization form. (e.g. \"KD\")
--basename Modifies only the last element of the path.
";
use Encode;
use Getopt::Long;
use Unicode::Normalize 'normalize';
use File::Basename;
use I18N::Langinfo qw(langinfo CODESET);
Getopt::Long::Configure ("bundling");
# ----------------------------------------------------------------------------------------------- #
# Our variables. #
# ----------------------------------------------------------------------------------------------- #
my $apply = 0;
my $verbose = 0;
my $help = 0;
my $debug = 0;
my $basename = 0;
my $unicode_normalize = "";
my #scripts;
my $from_charset = "";
my $to_charset = "";
my $codeset = "";
# ----------------------------------------------------------------------------------------------- #
# Get cmdline options. #
# ----------------------------------------------------------------------------------------------- #
$result = GetOptions ("apply" => \$apply,
"verbose|v+" => \$verbose,
"execute|e=s" => \#scripts,
"from-charset=s" => \$from_charset,
"to-charset=s" => \$to_charset,
"unicode-normalize=s" => \$unicode_normalize,
"basename" => \$basename,
"help|h|?" => \$help,
"debug" => \$debug);
# If not going to apply, then be verbose.
if (!$apply && $verbose == 0) {
$verbose = 1;
}
if ((($#scripts == -1)
&& (($from_charset eq "") || ($to_charset eq ""))
&& $unicode_normalize eq "")
|| ($#ARGV == -1) || ($help)) {
print $help_msg;
exit(0);
}
if (($to_charset ne "" && $from_charset eq "")
||($from_charset eq "" && $to_charset ne "")
||($to_charset eq "" && $from_charset eq "" && $unicode_normalize ne "")) {
$codeset = langinfo(CODESET);
$to_charset = $codeset if $from_charset ne "" && $to_charset eq "";
$from_charset = $codeset if $from_charset eq "" && $to_charset ne "";
}
# ----------------------------------------------------------------------------------------------- #
# Composes the filter function using the #scripts array and possibly other options. #
# ----------------------------------------------------------------------------------------------- #
$f = "sub filterfunc() {\n my \$s = shift;\n";
$f .= " my \$d = dirname(\$s);\n my \$s = basename(\$s);\n" if ($basename != 0);
$f .= " for (\$s) {\n";
$f .= " $_;\n" foreach (#scripts); # Get scripts from '-e' opt. #
# Handle charset translation and normalization.
if (($from_charset ne "") && ($to_charset ne "")) {
if ($unicode_normalize eq "") {
$f .= " \$_ = encode(\"$to_charset\", decode(\"$from_charset\", \$_));\n";
} else {
$f .= " \$_ = encode(\"$to_charset\", normalize(\"$unicode_normalize\", decode(\"$from_charset\", \$_)));\n"
}
} elsif (($from_charset ne "") || ($to_charset ne "")) {
die "You can't use `from-charset' nor `to-charset' alone";
} elsif ($unicode_normalize ne "") {
$f .= " \$_ = encode(\"$codeset\", normalize(\"$unicode_normalize\", decode(\"$codeset\", \$_)));\n"
}
$f .= " }\n";
$f .= " \$s = \$d . '/' . \$s;\n" if ($basename != 0);
$f .= " return \$s;\n}\n";
print "Generated function:\n\n$f" if ($debug);
# ----------------------------------------------------------------------------------------------- #
# Evaluates the filter function body, so to define it in our scope. #
# ----------------------------------------------------------------------------------------------- #
eval $f;
# ----------------------------------------------------------------------------------------------- #
# Main loop, which passes names through filters and renames files. #
# ----------------------------------------------------------------------------------------------- #
foreach (#ARGV) {
$old_name = $_;
$new_name = filterfunc($_);
if ($old_name ne $new_name) {
if (!$apply or (rename $old_name, $new_name)) {
print "`$old_name' => `$new_name'\n" if ($verbose);
} else {
print "Cannot rename `$old_name' to `$new_name'.\n";
}
} else {
print "`$old_name' unchanged.\n" if ($verbose > 1);
}
}

This worked for me using regexp:
I wanted files to be renamed like this:
file0001.txt -> 1.txt
ofile0002.txt -> 2.txt
f_i_l_e0003.txt -> 3.txt
usig the [a-z|_]+0*([0-9]+.) regexp where ([0-9]+.) is a group substring to use on the rename command
ls -1 | awk 'match($0, /[a-z|\_]+0*([0-9]+.*)/, arr) { print arr[0] " " arr[1] }'|xargs -l mv
Produces:
mv file0001.txt 1.txt
mv ofile0002.txt 2.txt
mv f_i_l_e0003.txt 3.txt
Another example:
file001abc.txt -> abc1.txt
ofile0002abcd.txt -> abcd2.txt
ls -1 | awk 'match($0, /[a-z|\_]+0*([0-9]+.*)([a-z]+)/, arr) { print arr[0] " " arr[2] arr[1] }'|xargs -l mv
Produces:
mv file001abc.txt abc1.txt
mv ofile0002abcd.txt abcd2.txt
Warning, be careful.

I wrote this script to search for all .mkv files recursively renaming found files to .avi. You can customize it to your neeeds. I've added some other things such as getting file directory, extension, file name from a file path just incase you need to refer to something in the future.
find . -type f -name "*.mkv" | while read fp; do
fd=$(dirname "${fp}");
fn=$(basename "${fp}");
ext="${fn##*.}";
f="${fn%.*}";
new_fp="${fd}/${f}.avi"
mv -v "$fp" "$new_fp"
done;

A generic script to run a sed expression on a list of files (combines the sed solution with the rename solution):
#!/bin/sh
e=$1
shift
for f in $*; do
fNew=$(echo "$f" | sed "$e")
mv "$f" "$fNew";
done
Invoke by passing the script a sed expression, and then any list of files, just like a version of rename:
script.sh 's/^fgh/jkl/' fgh*

You can also use below script. it is very easy to run on terminal...
//Rename multiple files at a time
for file in FILE_NAME*
do
mv -i "${file}" "${file/FILE_NAME/RENAMED_FILE_NAME}"
done
Example:-
for file in hello*
do
mv -i "${file}" "${file/hello/JAISHREE}"
done

This is an extended version of the find + sed + xargs solution.
Original solutions: this and this.
Requirements: search, prune, regex, rename
I want to rename multiple files in many folders.
Some folders should be pruned/excluded.
I am on cygwin and cannot get perl rename to work, which is required for the most popular solution (and I assume it to be slow, since it does not seem to have a pruning option?)
Solution
Use find to get files effectively (with pruning), and with many customization options.
Use sed for regex replacement.
Use xargs to funnel the result into the final command.
Example 1: rename *.js files but ignore node_modules
This example finds files and echos the found file and the renamed file. For safety reasons, it does not move anything for now. You have to replace echo with mv for that.
set -x # stop on error
set -e # verbose mode (echo all commands)
find "." -type f -not \( -path "**/node_modules/**" -prune \) -name "*.js" |
sed -nE "s/(.*)\/my(.*)/& \1\/YOUR\2/p" |
xargs -n 2 echo # echo first (replace with `mv` later)
The above script turns this:
./x/y/my-abc.js
Into this:
./x/y/YOUR-abc.js
Breakdown of Solution
find "." -type f -not \( -path "**/node_modules/**" -prune \) -name "*.js"
Searches for files (-type f).
The -not part excludes (and, importantly does not traverse!) the (notoriously ginormous) node_modules folder.
File name must match "*.js".
You can add more include and exclude clauses.
Refs:
This post discusses recursive file finding alternatives.
This post discusses aspects of pruning and excluding.
man find
sed -nE "s/(.*)\/my\-(.*\.js)/& \1\/YOUR-\2/p"
NOTE: sed always takes some getting used to.
-E enables "extended" (i.e. more modern) regex syntax.
-n is used in combination with the trailing /p flag: -n hides all results, while /p will print only matching results. This way, we only see/move files that need changing, and ignore all others.
Replacement regex with sed (and other regex tools) is always of the format: s/regex/replacement/FLAGS
In replacement, the & represents the matched input string. This will be the first argument to mv.
Refs:
linux regex tutorial
man sed
xargs -n 2 echo
Run the command echo with (the first two strings of) the replaced string.
Refs: man xargs
Good luck!

Related

Copy SRC directory with adding a prefix to all c-library functions

I have a embedded C static library that communicates with a hardware peripheral. It currently does not support multiple hardware peripherals, but I need to interface to a second one. I do not care about code footprint rightnow. So I want to duplicate that library; one for each hardware.
This of course, will result in symbol collision. A good method is to use objcopy to add a prefix to object files. So I can get hw1_fun1.o and hw2_fun1.o. This post illustrates it.
I want to add a prefix to all c functions on the source level, not the object. Because I will need to modify a little bit for hw2.
Is there any script, c-preprocessor, tool that can make something like:
./scriptme --prefix=hw2 ~/src/ ~/dest/
I'll be grateful :)
I wrote a simple bash script that does the required function, or sort of. I hope it help someone one day.
#!/bin/sh
DIR_BIN=bin/ext/lwIP/
DIR_SRC=src/ext/lwIP/
DIR_DST=src/hw2_lwIP/
CMD_NM=mb-nm
[ -d $DIR_DST ] || ( echo "Destination directory does not exist!" && exit 1 );
cp -r $DIR_SRC/* $DIR_DST/
chmod -R 755 $DIR_DST # cygwin issue with Windows7
sync # file permissions. (Pure MS shit!)
funs=`find $DIR_BIN -name *.o | xargs $CMD_NM | grep " R \| T " | awk '{print $3}'`
echo "Found $(echo $funs | wc -w) functions, processing:"
for fun in $funs;
do
echo " $fun";
find $DIR_DST -type f -exec sed -i "s/$fun/hw2_$fun/g" {} \;
done;
echo "Done! Now change includes and compile your project ;-)"

Script to group numbered files into folders

I have around a million files in one folder in the form xxxx_description.jpg where xxx is a number ranging from 100 to an unknown upper.
The list is similar to this:
146467_description1.jpg
146467_description2.jpg
146467_description3.jpg
146467_description4.jpg
14646_description1.jpg
14646_description2.jpg
14646_description3.jpg
146472_description1.jpg
146472_description2.jpg
146472_description3.jpg
146500_description1.jpg
146500_description2.jpg
146500_description3.jpg
146500_description4.jpg
146500_description5.jpg
146500_description6.jpg
To get the file number down in the at folder I'd like to put them all into folders grouped by the number at the start.
ie:
146467/146467_description1.jpg
146467/146467_description2.jpg
146467/146467_description3.jpg
146467/146467_description4.jpg
14646/14646_description1.jpg
14646/14646_description2.jpg
14646/14646_description3.jpg
146472/146472_description1.jpg
146472/146472_description2.jpg
146472/146472_description3.jpg
146500/146500_description1.jpg
146500/146500_description2.jpg
146500/146500_description3.jpg
146500/146500_description4.jpg
146500/146500_description5.jpg
146500/146500_description6.jpg
I was thinking to try and use command line: find | awk {} | mv command or maybe write a script, but I'm not sure how to do this most efficiently.
If you really are dealing with millions of files, I suspect that a glob (*.jpg or [0-9]*_*.jpg may fail because it makes a command line that's too long for the shell. If that's the case, you can still use find. Something like this might work:
find /path -name "[0-9]*_*.jpg" -exec sh -c 'f="{}"; mkdir -p "/target/${f%_*}"; mv "$f" "/target/${f%_*}/"' \;
Broken out for easier reading, this is what we're doing:
find /path - run find, with /path as a starting point,
-name "[0-9]*_*.jpg" - match files that match this filespec in all directories,
-exec sh -c execute the following on each file...
'f="{}"; - put the filename into a variable...
mkdir -p "/target/${f%_*}"; - make a target directory based on that variable (read mkdir's man page about the -p option)
mv "$f" "/target/${f%_*}/"' - move the file into the directory.
\; - end the -exec expression
On the up side, it can handle any number of files that find can handle (i.e. limited only by your OS). On the down side, it's launching a separate shell for each file to be handled.
Note that the above answer is for Bourne/POSIX/Bash. If you're using CSH or TCSH as your shell, the following might work instead:
#!/bin/tcsh
foreach f (*_*.jpg)
set split = ($f:as/_/ /)
mkdir -p "$split[1]"
mv "$f" "$split[1]/"
end
This assumes that the filespec will fit in tcsh's glob buffer. I've tested with 40000 files (894KB) on one command line and not had a problem using /bin/sh or /bin/csh in FreeBSD.
Like the Bourne/POSIX/Bash parameter expansion solution above, this avoids unnecessary calls to external I haven't tested that, and would recommend the find solution even though it's slower.
You can use this script:
for i in [0-9]*_*.jpg; do
p=`echo "$i" | sed 's/^\([0-9]*\)_.*/\1/'`
mkdir -p "$p"
mv "$i" "$p"
done
Using grep
for file in *.jpg;
do
dirName=$(echo $file | grep -oE '^[0-9]+')
[[ -d $dirName ]] || mkdir $dirName
mv $file $dirName
done
grep -oE '^[0-9]+' extracts the starting digits in the filename as
146467
146467
146467
146467
14646
...
[[ -d $dirName ]] returns 1 if the directory exists
[[ -d $dirName ]] || mkdir $dirName ensures that the mkdir works only if the test [[ -d $dirName ]] fails, that is the direcotry does not exists

Move files containing X but not containing Y

To manage my backup sync folder, I am trying to come up with a command that would move files beginning with string1* but NOT ending with *string2 from /folder1 to /folder2
What would a command containing such two opposite conditions (HAS and HAS NOT) look like?
#!/bin/bash
for i in `ls -d /folder1/string1* | grep -v 'string2$'`
do
ls -ld $i | grep '^-' > /dev/null # Test that we have a regular file and not a directory etc.
if [ $? == 0 ]; then
mv $i /folder2
fi
done
Try something like
find /folder1 -mindepth 1 -maxdepth 1 -type f \
-name 'string1*' \! -name '*string2' -exec cp -iv {} /folder2 +
Note: If your have a older version of find you can replace + with \;
To me this is another case for (what I shall denote) the read while pattern.
cd /folder1
ls string1* | grep -v 'string2$' | while read f; do mv $f /folder2; done
The other answers are good alternatives, and in particular, find can do a lot. But I always get a headache using find, and never quite use it enough to do so without the manpage open.
Also, starting with ls or a simple find to get a list of files, and then using any or all of sed, awk, grep or whatever you have to hand, to adjust/trim/extend this list, and then bunging it into a loop, is a crude(ish) but pretty powerful technique.

Looking to take only main folder name within a tarball & match it to folders to see if it's been extracted

I have a situation where I need to keep .tgz files & if they've been extracted, remove the extracted directory & contents.
In all examples, the only top-level directory within the tarball has a different name than the tarball itself:
[host1]$ find / -name "*\#*.tgz" #(has an # symbol somewhere in the name)
/1-#-test.tgz
[host1]$ tar -tzvf /1-#-test.tgz | head -n 1 | awk '{ print $6 }'
TJ #(directory name)
What I'd like to accomplish (pulling my hair out; rusty scripting fingers), is to look at each tarball, see if the corresponding directory name (like above) exists. If it does, echo "rm -rf /directoryname" into an output file for review.
I can read all of the tarballs into an array ... but how to check the directories?
Frustrated & appreciate any help.
Maybe you're looking for something like this:
find / -name "*#*.tgz" | while read line; do
dir=$(tar ztf "$line" | awk -F/ '{print $6; exit}')
test -d "$dir" && echo "rm -fr '$dir'"
done
Explanation:
We iterate over the *#*.tgz files found with a while loop, line by line
Get the list of files in the tgz file with tar ztf "$line"
Since paths are separated by /, use that as the separator in the awk, print the 6th field. After the print we exit, making this equivalent to but more efficient than using head -n1 first
With dir=$(...) we put the entire output of the tar..awk chain, thus the 6th field of the first file in the tar, into the variable dir
We check if such directory exists, if yes then echo an rm command so you can review and execute later if looks good
My original answer used a find ... -exec but I think that's not so good in this particular case:
find / -name "*#*.tgz" -exec \
sh -c 'dir=$(tar ztf "{}" | awk -F/ "{print \$6; exit}");\
test -d "$dir" && echo "rm -fr \"$dir\""' \;
It's not so good because of running sh for every file, and since we are using {} in the subshell, we lose the usual benefits of a typical find ... -exec where special characters in {} are correctly handled.

How to diff just the source files?

I have two almost similar source code trees, but do not have access to the source code repository so I am stuck with release packages that contain also test reports, documentation, binaries etc.
the diff command only support --exclude, but I would like to do something like diff -wbur --include='*.c,*.h' tree1 tree2
I know that this question is somewhat related, but does not really address my issue.
Bonus points for ignoring change blocks that are completely in C comments :)
Little modification to a result from google helped, in tree1 did find . -name '*.[ch]' -exec diff -wibu {} ../tree2/{} \;
Here's a little patch maker script:
#!/bin/bash
USAGE="USAGE: $0 <dist dir> <edited dir>"
[ '--help' == "$1" ] && { echo $USAGE; exit 0; }
[ 2 -eq $# ] || { echo $USAGE; exit 1; }
# trim starting './' and trailing /'/
original=$(echo $1 | sed 's-^\./--;s-/$--')
changed=$(echo $2 | sed 's-^\./--;s-/$--')
[ -d $original ] || { echo "ERROR: Directory $original does not exist" >&2 ; exit 2; }
[ -d $changed ] || { echo "ERROR: Directory $changed does not exist" >&2; exit 3; }
#command="ls -l"
command="diff -Naur"
find $original -name '*.[ch]' -o -name '*.cpp' | sed 's-^[^/]*/--' | { while read file; do $command $original/$file $changed/$file; done; }
I would exclude everything that doesn't match .c or .h. So it means it will only include .c and .h files :
diff -x "*.[^ch]"
For me it is the best way to do because you are only using diff
You can write all files to exclude into a temporary file and give it to diff's -X argument.
find tree1 tree2 -type f -not -name '*.[ch]' >exludes
diff -wbur tree1 tree2 -X excludes
Or simplier (works in Bash):
diff -wbur tree1 tree2 -X <(find tree1 tree2 -type f -not -name '*.[ch]')
You can use multiple name arguments if you have longer filename extensions:
diff -wbur tree1 tree2 \
-X <(find tree1 tree2 -type f -not -name '*.java' -and -not -name '*.sql')
I proposed an easier solution than find to diff maintainers, here is the thread:
http://lists.gnu.org/archive/html/bug-diffutils/2014-10/msg00000.html
The idea is provide a new option which instructs diff to parse on files that match a regex, like:
diff -Nurp --only "*.[hc]" source/ source-new/
Here is the instructions to patch diffutils
Clone the repo
git clone git://git.savannah.gnu.org/diffutils.git
run bootstrap.sh inside diffutils directory and resolve the dependencies until it creates the ./cofigure script
Download the patch from link above
Apply it
git apply <PATCHFILE>
Configure and compile
./configure
make
This will create the patched diff in src/diff
Cheers

Resources