How do I find out if a file name has any extension in Unix? - file

I need to find out if a file or directory name contains any extension in Unix for a Bourne shell scripting.
The logic will be:
If there is a file extension
Remove the extension
And use the file name without the extension
This is my first question in SO so will be great to hear from someone.

The concept of an extension isn't as strictly well-defined as in traditional / toy DOS 8+3 filenames. If you want to find file names containing a dot where the dot is not the first character, try this.
case $filename in
[!.]*.*) filename=${filename%.*};;
esac
This will trim the extension (as per the above definition, starting from the last dot if there are several) from $filename if there is one, otherwise no nothing.
If you will not be processing files whose names might start with a dot, the case is superfluous, as the assignment will also not touch the value if there isn't a dot; but with this belt-and-suspenders example, you can easily pick the approach you prefer, in case you need to extend it, one way or another.
To also handle files where there is a dot, as long as it's not the first character (but it's okay if the first character is also a dot), try the pattern ?*.*.
The case expression in pattern ) commands ;; esac syntax may look weird or scary, but it's quite versatile, and well worth learning.

I would use a shell agnostic solution. Runing the name through:
cut -d . -f 1
will give you everything up to the first dot ('-d .' sets the delimeter and '-f 1' selects the first field). You can play with the params (try '--complement' to reverse selection) and get pretty much anything you want.

Related

Replace a number in a file using array data, bash

I'm not an expert in bash coding and I'm trying to do one interative-like code to help me in my work.
I have a file that contains some numbers (coordinates), and I'm trying to make a code to read some specific numbers from the file and then store them in an array. Modify that array using some arithmetic operation and then replace the numbers in the original file with the modified array. So far I've done everything except replacing the numbers in the file, I tried using sed but it does not change the file. The original numbers are stored in an array called "readfile" and the new numbers are stored in an array called "d".
I'm trying to use sed in this way: sed -i 's/${readfile[$j]}/${d[$k]}/' file.txt
And I loop j and k to cover all the numbers in the arrays. Everything seems to work but the file is not being modified. After some digging, I'm noticing that sed is not reading the value of the array, but I do not know how to fix that.
Your help is really appreciated.
When a file isn't modified by sed -i, it means sed didn't find any matches to modify. Your pattern is wrong somehow.
After using " instead of ' so that the variables can actually be evaluated inside the string, look at the contents of the readfile array and check whether it actually matches the text. If it seems to match, look for special characters in the pattern, characters that would mean something specific to sed (the most common mistake is /, which will interfere with the search command).
The fix for special characters is either to (1) escape them, e.g. \/ instead of just /, or (2) (and especially for /) to use another delimiter for the search/replace command (instead of s/foo/bar/ you can use s|foo|bar| or s,foo,bar, etc - pretty much any delimiter works, so you can pick one that you know isn't in the pattern string).
If you post data samples and more of your script, we can look at where you went wrong.

looping a ffmpeg command that joins two files

Command ffmpeg -i file-1.mp4 -vf ass=file-1a.ass burned-1.mp4
works to burn file-1a.ass subtitles on file-1.mp4 video.
But each time I have to reiterate the same command on over 40 different videos and subtitles and each time I have to wait for rendering the output.
So perhaps there is a way to automatically reiterate the same command on all the files.
Looking for a reply found the loop command
for f in *; do ffmpeg $f;
But I am confused how to use it with 2 files, the .mp4 and the .ass file, and also the output file which should have the same number
I imagine should put the same name on each couple of files, such as:
1.mp4 1.ass
2.mp4 2.ass
3.mp4 3.ass
etc
and then
for f in *; do ffmpeg -i $f.mp4 -vf ass=$f.ass $f-output.mp4
But I have no clear idea
You have the right idea. But it won’t work if the loop executes with f == 1.mp4, then again with f == 1.ass, and so on.
So you want to modify the loop to only iterate over .mp4 files. Then you want to strip the .mp4 extension from the value of f, that is, strip the last 4 characters from the value of f, using ${f:0: -4} (this means “get a substring of f, starting at character 0 and ending at 5 characters before the end”).
You obviously want to terminate the loop with done. I also suggest wrapping the parameters in quotes, to prevent word splitting (that is, if the filenames contain certain characters, they might be split into multiple arguments to ffmpeg).
Putting it all together:
for f in *.mp4; do f=${f%.*}; ffmpeg -i "$f.mp4" -vf ass="$f.ass" "$f-output.mp4"; done
Of course, once you have run this, you need to get rid of all the output files before you can run it again. Or you can just put the output files in a different directory to begin with.
Edit: Another user posted an answer, which seems to have been deleted. It was a good answer but lacked explanation. It was basically the same as my answer, except that it used ${f%.mp4} to strip the .mp4 extension. My answer is probably slightly more complex but slightly more efficient, so it’s basically a matter of personal preference.
Edit 2: Based on the link provided by llogan’s comment, I have made these changes:
Remove the quotes in the assignment, as assignments are not subject to word splitting (this is also stated in the bash man page).
Use ${f%.*} to strip the extension. This strips a dot followed by any sequence of characters from the end. It looks for the shortest possible match, so it’s really looking for a dot followed by any sequence of non-dot characters at the end.

Reversing shell-style brace expansion

Brace expansion takes a pattern and expands it. For example:
sp{el,il,al}l
Expands to:
spell spill spall
Is there an algorithm (potentially with a JavaScript implementation) to do the reverse in a way that minimizes the constructed string?
i.e., take in an array [spell spill spall] and return a string "sp{e,i,a}ll"
Minimizing the resulting string can be done in many different ways, but since you mention Bash, I'll choose the Bash way which is not the most optimized one.
Yes, there is a Bash way! Bash creators have included it as the readline command complete-into-braces. When using Bash interactively, if you hit Meta{ (which is either Alt{ or Esc-then-{ on my machine), all possible completions are grouped into one single brace expansion.
$ echo /usr/
bin/ games/ include/ lib/ local/ sbin/ share/ src/
$ echo /usr/{bin,games,include,l{ib,ocal},s{bin,hare,rc}}
Above, the first time I hit Tab to show all possible completions, and the second time I hit Alt{.
Back to your question: you are looking for an algorithm. Obviously you may find something in Bash source code. The function you are looking for is really_munge_braces() in bracecomp.c
As requested in the original question, node-brace-compression contains a JavaScript implementation. E.g.
var compress = require('brace-compression');
var data = [
'foo-1',
'foo-2',
'foo-3'
];
console.log(compress(data));
// => "foo-{1..3}"

Using exec on each file in a bash script

I'm trying to write a basic find command for a assignment (without using find). Right now I have an array of files I want to exec something on. The syntax would look like this:
-exec /bin/mv {} ~/.TRASH
And I have an array called current that holds all of the files. My array only holds /bin/mv, {}, and ~/.TRASH (since I shift the -exec out) and are in an array called arguments.
I need it so that every file gets passed into {} and exec is called on it.
I'm thinking I should use sed to replace the contents of {} like this (within a for loop):
for i in "${current[#]}"; do
sed "s#$i#{}"
#exec stuff?
done
How do I exec the other arguments though?
You can something like this:
cmd='-exec /bin/mv {} ~/.TRASH'
current=(test1.txt test2.txt)
for f in "${current[#]}"; do
eval $(sed "s/{}/$f/;s/-exec //" <<< "$cmd")
done
Be very careful with eval command though as it can do nasty things if input comes from untrusted sources.
Here is an attempt to avoid eval (thanks to #gniourf_gniourf for his comments):
current=( test1.txt test2.txt )
arguments=( "/bin/mv" "{}" ~/.TRASH )
for f in "${current[#]}"; do
"${arguments[#]/\{\}/$f}"
done
Your are lucky that your design is not too bad, that your arguments are in an array.
But you certainly don't want to use eval.
So, if I understand correctly, you have an array of files:
current=( [0]='/path/to/file'1 [1]='/path/to/file2' ... )
and an array of arguments:
arguments=( [0]='/bin/mv' [1]='{}' [2]='/home/alex/.TRASH' )
Note that you don't have the tilde here, since Bash already expanded it.
To perform what you want:
for i in "${current[#]}"; do
( "${arguments[#]//'{}'/"$i"}" )
done
Observe the quotes.
This will replace all the occurrences of {} in the fields of arguments by the expansion of $i, i.e., by the filename1, and execute this expansion. Note that each field of the array will be expanded to one argument (thanks to the quotes), so that all this is really safe regarding spaces, glob characters, etc. This is really the safest and most correct way to proceed. Every solution using eval is potentially dangerous and broken (unless some special quotings is used, e.g., with printf '%q', but this would make the method uselessly awkward). By the way, using sed is also broken in at least two ways.
Note that I enclosed the expansion in a subshell, so that it's impossible for the user to interfere with your script. Without this, and depending on how your full script is written, it's very easy to make your script break by (maliciously) changing some variables stuff or cd-ing somewhere else. Running your argument in a subshell, or in a separate process (e.g., separate instance of bash or sh—but this would add extra overhead) is really mandatory for obvious security reasons!
Note that with your script, user has a direct access to all the Bash builtins (this is a huge pro), compared to some more standard find versions2!
1 Note that POSIX clearly specifies that this behavior is implementation-defined:
If a utility_name or argument string contains the two characters "{}", but not just the two characters "{}", it is implementation-defined whether find replaces those two characters or uses the string without change.
In our case, we chose to replace all occurrences of {} with the filename. This is the same behavior as, e.g., GNU find. From man find:
The string {} is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find.
2 POSIX also specifies that calling builtins is not defined:
If the utility_name names any of the special built-in utilities (see Special Built-In Utilities), the results are undefined.
In your case, it's well defined!
I think that trying to implement (in pure Bash) a find command is a wonderful exercise that should teach you a lot… especially if you get relevant feedback. I'd be happy to review your code!

implementing globbing in a shell prototype

I'm implementing a linux shell for my weekend assignment and I am having some problems implementing wilcard matching as a feature in shell. As we all know, shells are a complete language by themselves, e.g. bash, ksh, etc. I don't need to implement the complete features like control structures, jobs etc. But how to implement the *?
A quick analysis gives you the following result:
echo *
lists all the files in the current directory. Is this the only logical manifestation of the shell? I mean, not considering the language-specific features of bash, is this what a shell does, internally? Replace a * with all the files in the current directory matching the pattern?
Also I have heard about Perl Compatible Regular Expression , but it seems to complex to use a third party library.
Any suggestions, links, etc.? I will try to look at the source code as well, for bash.
This is called "globbing" and the function performing this is named the same: glob(3)
Yes, that's what shell does. It will replace '*' characters by all files and folder names in cwd. It is in fact very basic regular expressions supporting only '?' and '*' and matching with file and folder names in cwd.
Remark that backslashed \* and '*' enclosed between simple or double quotes ' or " are not replaced (backslash and quotes are removed before passing to the command executed).
If you want more control than glob gives, the standard function fnmatch performs just glob matching.
Note that shells also performs word expansion (e.g. "~" → "/home/user"), which should be done before glob expansion, if you're doing filename matching manually. (Or use wordexp.)

Resources