ksh: remove last extension from a multiple extension filename - file

I have a filename in the format dir1/dir2/filename.txt.org and I like to rename this to dir1/dir2/filename.txt . how can this be done. I tried 'cut' with '.' separator but it also removes .txt

You can try korn shell variable expansion formats, instead of using a subprocess (e.g. cut) . This can be much faster.
example:
var1=dir1/dir2/filename.txt.org
var2=${var1%.*}
If you now print $var2 its value will be dir1/dir2/filename.txt
The % tells it to delete the smallest matching rightmost match for .* (which means anything following the rightmost period character).
${variable%pattern} - return the value of variable without the smallest ending portion that matches pattern.
Other variable expansion formats are available, it is worthwhile to study the docs.

Related

Replace a number in a file using array data, bash

I'm not an expert in bash coding and I'm trying to do one interative-like code to help me in my work.
I have a file that contains some numbers (coordinates), and I'm trying to make a code to read some specific numbers from the file and then store them in an array. Modify that array using some arithmetic operation and then replace the numbers in the original file with the modified array. So far I've done everything except replacing the numbers in the file, I tried using sed but it does not change the file. The original numbers are stored in an array called "readfile" and the new numbers are stored in an array called "d".
I'm trying to use sed in this way: sed -i 's/${readfile[$j]}/${d[$k]}/' file.txt
And I loop j and k to cover all the numbers in the arrays. Everything seems to work but the file is not being modified. After some digging, I'm noticing that sed is not reading the value of the array, but I do not know how to fix that.
Your help is really appreciated.
When a file isn't modified by sed -i, it means sed didn't find any matches to modify. Your pattern is wrong somehow.
After using " instead of ' so that the variables can actually be evaluated inside the string, look at the contents of the readfile array and check whether it actually matches the text. If it seems to match, look for special characters in the pattern, characters that would mean something specific to sed (the most common mistake is /, which will interfere with the search command).
The fix for special characters is either to (1) escape them, e.g. \/ instead of just /, or (2) (and especially for /) to use another delimiter for the search/replace command (instead of s/foo/bar/ you can use s|foo|bar| or s,foo,bar, etc - pretty much any delimiter works, so you can pick one that you know isn't in the pattern string).
If you post data samples and more of your script, we can look at where you went wrong.

Split array element delimited with '.'

I am trying to read below CSV file content line by line in Perl.
CSV File Content:
A7777777.A777777777.XXX3604,XXX,3604,YES,9
B9694396.B216905785.YYY0018,YYY,0018,YES,13
C9694396.C216905785.ZZZ0028,ZZZ,0028,YES,16
I am able to split line content using below code and able to verify the content too:
#column_fields1 = split(',', $_);
print $column_fields1[0],"\n";
I am also trying to find the second part on the first column of CSV file (i.e., A777777777 or B216905785 or C216905785) – the first column delimited with . using the below code and I am unable to get it.
Instead, just a new line printed.
my ($v1, $v2, $v3) = split(".", $column_fields1[0]);
print $v2,"\n";
Can someone suggest me how to split the array element and get the above value?
On my functionality, I need the first column value altogether at someplace and just only the second part at someplace.
Below is my code:
use strict;
use warnings;
my $dailybillable_tab_section1_file = "./sql/demanding_01_T.csv";
open(FILE, $dailybillable_tab_section1_file) or die "Could not read from $dailybillable_tab_section1_file, program halting.";
my #column_fields1;
my #column_fields2;
while (<FILE>)
{
chomp;
#column_fields1 = split(',', $_);
print $column_fields1[0],"\n";
my ($v1, $v2, $v3) = split(".",$column_fields1[0]);
print $v2,"\n";
if($v2 ne 'A777777777')
{
…
…
…
}
else
{
…
…
…
}
}
close FILE;
split takes a regex as its first argument. You can pass it a string (as in your code), but the contents of the string will simply be interpreted as a regex at runtime.
That's not a problem for , (which has no special meaning in a regex), but it breaks with . (which matches any (non-newline) character in a regex).
Your attempt to fix the problem with split "\." fails because "\." is identical to ".": The backslash has its normal string escape meaning, but since . isn't special in strings, escaping it has no effect. You can see this by just printing the resulting string:
print "\.\n"; # outputs '.', same as print ".\n";
That . is then interpreted as a regex, causing the problems you have observed.
The normal fix is to just pass a regex to split:
split /\./, $string
Now the backslash is interpreted as part of the regex, forcing . to match itself literally.
If you really wanted to pass a string to split (I'm not sure why you'd want to do that), you could also do it like this:
split "\\.", $string
The first backslash escapes the second backslash, giving a two character string (\.), which when interpreted as a regex means the same thing as /\./.
If you look at the documentation for split(), you'll see it gives the following ways to call the function:
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
split
In three of those examples, the first argument to the function is /PATTERN/. That is, split() expects to be given a regular expression which defines how the input string is split apart.
It's very important to realise that this argument is a regex, not a string. Unfortunately, Perl's parser doesn't insist on that. It allows you to use a first argument which looks like a string (as you have done). But no matter how it looks, it's not a string. It's a regex.
So you have confused yourself by using code like this:
split(".",$COLUMN_FIELDS1[0])
If you had made the first argument look like a regex, then you would be more likely to realise that the first argument is a regex and that, therefore, a dot needs to be escaped to prevent it being interpreted as a metacharacter.
split(/\./, $COLUMN_FIELDS1[0])
Update: It's generally accepted among Perl programmers, that variable with upper case names are constants and don't change their values. By using upper case names for standard variables, you are likely to confuse the next person who edits your code (who could well be you in six months time).

Using exec on each file in a bash script

I'm trying to write a basic find command for a assignment (without using find). Right now I have an array of files I want to exec something on. The syntax would look like this:
-exec /bin/mv {} ~/.TRASH
And I have an array called current that holds all of the files. My array only holds /bin/mv, {}, and ~/.TRASH (since I shift the -exec out) and are in an array called arguments.
I need it so that every file gets passed into {} and exec is called on it.
I'm thinking I should use sed to replace the contents of {} like this (within a for loop):
for i in "${current[#]}"; do
sed "s#$i#{}"
#exec stuff?
done
How do I exec the other arguments though?
You can something like this:
cmd='-exec /bin/mv {} ~/.TRASH'
current=(test1.txt test2.txt)
for f in "${current[#]}"; do
eval $(sed "s/{}/$f/;s/-exec //" <<< "$cmd")
done
Be very careful with eval command though as it can do nasty things if input comes from untrusted sources.
Here is an attempt to avoid eval (thanks to #gniourf_gniourf for his comments):
current=( test1.txt test2.txt )
arguments=( "/bin/mv" "{}" ~/.TRASH )
for f in "${current[#]}"; do
"${arguments[#]/\{\}/$f}"
done
Your are lucky that your design is not too bad, that your arguments are in an array.
But you certainly don't want to use eval.
So, if I understand correctly, you have an array of files:
current=( [0]='/path/to/file'1 [1]='/path/to/file2' ... )
and an array of arguments:
arguments=( [0]='/bin/mv' [1]='{}' [2]='/home/alex/.TRASH' )
Note that you don't have the tilde here, since Bash already expanded it.
To perform what you want:
for i in "${current[#]}"; do
( "${arguments[#]//'{}'/"$i"}" )
done
Observe the quotes.
This will replace all the occurrences of {} in the fields of arguments by the expansion of $i, i.e., by the filename1, and execute this expansion. Note that each field of the array will be expanded to one argument (thanks to the quotes), so that all this is really safe regarding spaces, glob characters, etc. This is really the safest and most correct way to proceed. Every solution using eval is potentially dangerous and broken (unless some special quotings is used, e.g., with printf '%q', but this would make the method uselessly awkward). By the way, using sed is also broken in at least two ways.
Note that I enclosed the expansion in a subshell, so that it's impossible for the user to interfere with your script. Without this, and depending on how your full script is written, it's very easy to make your script break by (maliciously) changing some variables stuff or cd-ing somewhere else. Running your argument in a subshell, or in a separate process (e.g., separate instance of bash or sh—but this would add extra overhead) is really mandatory for obvious security reasons!
Note that with your script, user has a direct access to all the Bash builtins (this is a huge pro), compared to some more standard find versions2!
1 Note that POSIX clearly specifies that this behavior is implementation-defined:
If a utility_name or argument string contains the two characters "{}", but not just the two characters "{}", it is implementation-defined whether find replaces those two characters or uses the string without change.
In our case, we chose to replace all occurrences of {} with the filename. This is the same behavior as, e.g., GNU find. From man find:
The string {} is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find.
2 POSIX also specifies that calling builtins is not defined:
If the utility_name names any of the special built-in utilities (see Special Built-In Utilities), the results are undefined.
In your case, it's well defined!
I think that trying to implement (in pure Bash) a find command is a wonderful exercise that should teach you a lot… especially if you get relevant feedback. I'd be happy to review your code!

How do I find out if a file name has any extension in Unix?

I need to find out if a file or directory name contains any extension in Unix for a Bourne shell scripting.
The logic will be:
If there is a file extension
Remove the extension
And use the file name without the extension
This is my first question in SO so will be great to hear from someone.
The concept of an extension isn't as strictly well-defined as in traditional / toy DOS 8+3 filenames. If you want to find file names containing a dot where the dot is not the first character, try this.
case $filename in
[!.]*.*) filename=${filename%.*};;
esac
This will trim the extension (as per the above definition, starting from the last dot if there are several) from $filename if there is one, otherwise no nothing.
If you will not be processing files whose names might start with a dot, the case is superfluous, as the assignment will also not touch the value if there isn't a dot; but with this belt-and-suspenders example, you can easily pick the approach you prefer, in case you need to extend it, one way or another.
To also handle files where there is a dot, as long as it's not the first character (but it's okay if the first character is also a dot), try the pattern ?*.*.
The case expression in pattern ) commands ;; esac syntax may look weird or scary, but it's quite versatile, and well worth learning.
I would use a shell agnostic solution. Runing the name through:
cut -d . -f 1
will give you everything up to the first dot ('-d .' sets the delimeter and '-f 1' selects the first field). You can play with the params (try '--complement' to reverse selection) and get pretty much anything you want.

equivalent for regexp /<rr>(.*?)</rr>/<test>$1</test>/gi

I want to write simple program in C equivalent to the regular expression:
/<rr>(.*?)<\/rr>/<test>$1<\/test>/gi.
Does anyone have examples?
It helps if you understand what the regex is supposed to do.
The pattern
The parentheses (...) indicate the beginning and end of a group. They also create a backreference to be used later.
The . is a metacharacter that matches any character.
The * repetition specifier can be used to match "zero-or-more times" of the preceding pattern.
The ? is used here to make the preceding quantifier "lazy" instead of "greedy."
The $1 is likely (depends on the language) a reference to the first capture group. In this case it would be everything matched by (.*?)
The /g modifier at the end is used to perform a global match (find all matches rather than stopping after the first match).
The /i modifier is used to make case-insensitive matches
References
regular-expressions.info, Grouping, Dot, Repetition: *+?{…}

Resources