How to remove numbers from extensions from files - file

I have many files in a directory having extension like
.text(2) and .text(1).
I want to remove the numbers from extension and output should be like
.text and .text .
can anyone please help me with the shell script for that?
I am using centOs.

A pretty portable way of doing it would be this:
for i in *.text*; do mv "$i" "$(echo "$i" | sed 's/([0-9]\{1,\})$//')"; done
Loop through all files which end in .text followed by anything. Use sed to remove any parentheses containing one or more digits from the end of each filename.
If all of the numbers within the parentheses are single digits and you're using bash, you could also use built-in parameter expansion:
for i in *.text*; do mv "$i" "${i%([0-9])}"; done
The expansion removes any parentheses containing a single digit from the end of each filename.

Another way without loops, but also with sed (and all the regexp's inside) is piping to sh:
ls *text* | sed 's/\(.*\)\..*/mv \1* \1.text/' | sh
Example:
[...]$ ls
xxxx.text(1) yyyy.text(2)
[...]$ ls *text* | sed 's/\(.*\)\..*/mv \1* \1.text/' | sh
[...]$ ls
xxxx.text yyyy.text
Explanation:
Everything between \( and \) is stored and can be pasted again by \1 (or \2, \3, ... a consecutive number for each pair of parentheses used). Therefore, the code above stores all the characters before the first dot \. and after that, compounds a sequence like this:
mv xxxx* xxxx.text
mv yyyy* yyyy.text
That is piped to sh

Most simple way if files are in same folder
rename 's/text\([0-9]+\)/text/' *.text*
link

Related

Search and delete links in markdown files

I run from time to time a linkchecker over my site and the external links 404 will be saved to a logfile.
Now I try to delete the links automated from the markdown files. I use multilingual websites so I start read in the logfile to an array.
IFS=$'\n'
link=( $(awk '{print $7}' $ext) )
for i in "${link[#]}"; do
grep -r $i content/* | sed -e 's/([^()]*)//g'
done
This command deletes the link and title with () but the [Example Text] remains. I search for a way to remove [] so that at the end I only get Example Text.
Now:
[Example Text](http://example.com "Example Title")
Desired result:
Example Text
Assumptions
The i in for i in "${link[#]}" will evaluate to be a link like "http://example.com" each loop
The format of every section in your markdown file we care about will take on the form you described [Example Text](http://example.com "Example Title")
The code
IFS=$'\n'
link=( $(awk '{print $7}' $ext) )
for i in "${link[#]}"; do
grep -ro "\[.*\].*${i}" content/* | grep -o '\[.*\]' | tr -d '[]'
done
Explanation
grep -ro "\[.*\].*${i}" content/*:
Recursive search to run on all files in a dir: grep -r ... content/*
Print only the text that applies to our regex: grep -o
Print anything that starts with [ followed by anything .* then a ] followed by the value of our loop variable ${i} (The current link): "\[.*\].*${i}"
From that output all we want is "Example Text" which lives between the brackets, so anything not between brackets needs to go grep -o '\[.*\]'
Finally, we want to remove those pesky brackets: tr -d '[]'
The immediate fix is to extend your sed regex.
sed 's/\[\([^][]*\)\]([^()]*)/\1/g'
But probably a much better fix is to replace all the lines from the Awk script in content in a single go.
find content -type f -exec \
sed -i 's%\[\([^][]*\)\('"$(
awk 'NR>1 { printf "\|" }
{ printf "%s", $7 }' "$ext")"'\)%\1%g'
The Awk script produces a long regex like
http://one.example.net/nosuchpage\|http://two.exampe.org/404\|https://three.example.com/broken-link
from all the links in the input, and the sed script then replaces any links which match this regex in the parentheses after the square brackets. (Maybe you'll want to extend this to also permit a quoted string after the link before the closing round parenthesis, like in your example; I feel I am already quessing too many things about what you are actually hoping to accomplish.)
If you are on a *BSD platform (including MacOS) you'll need to add an empty string ar[ument after the -i argument, like sed -i '' 's%...

Strange behaviour with Bash, Arrays and empty spaces

Problem:
Writing a bash script, i'm trying to import a list of products that are inside a csv file into an array:
#!/bin/bash
PRODUCTS=(`csvprintf -f "/home/test/data/input.csv" -x | grep "col2" | sed 's/<col2>//g' | sed 's/<\/col2>//g' | sed -n '1!p' | sed '$ d' | sed 's/ //g'`)
echo ${PRODUCTS[#]}
In the interactive shell, the result/output looks perfect as following:
burger
special fries
juice - 300ml
When i use exactly the same commands in bash script, even debugging with bash -x script.sh, in the part of echo ${PRODUCTS[#]}, the result of array is all files names located at /home/test/data/ and:
burger
special
fries
juice
-
300ml
The array is taking directory list AND messed up newlines. This don't happen in interactive shell (single command line).
Anyone know how to fix that?
Looking at the docs for csvprintf, you're converting the csv into XML and then parsing it with regular expressions. This is generally a very bad idea.
You might want to install csvkit then you can do
csvcut -c prod input.csv | sed 1d
Or you could use a language that comes with a CSV parser module. For example, ruby
ruby -rcsv -e 'CSV.read("input.csv", :headers=>true).each {|row| puts row["prod"]}'
Whichever method you use, read the results into a bash array with this construct
mapfile -t products < <(command to extract the product data)
Then, to print the array elements:
for prod in "${products[#]}"; do echo "$prod"; done
# or
printf "%s\n" "${products[#]}"
The quotes around the array expansion are critical. If missing, you'll see one word per line.
Tip: don't use ALLCAPS variable names in the shell: leave those for the shell. One day you'll write PATH=something and then wonder why your script is broken.

Shell Script regex matches to array and process each array element

While I've handled this task in other languages easily, I'm at a loss for which commands to use when Shell Scripting (CentOS/BASH)
I have some regex that provides many matches in a file I've read to a variable, and would like to take the regex matches to an array to loop over and process each entry.
Regex I typically use https://regexr.com/ to form my capture groups, and throw that to JS/Python/Go to get an array and loop - but in Shell Scripting, not sure what I can use.
So far I've played with "sed" to find all matches and replace, but don't know if it's capable of returning an array to loop from matches.
Take regex, run on file, get array back. I would love some help with Shell Scripting for this task.
EDIT:
Based on comments, put this together (not working via shellcheck.net):
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=($(sed 'asset\((.*)\)' $examplefile))
for el in ${!examplearr[*]}
do
echo "${examplearr[$el]}"
done
This works in bash on a mac:
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=(`echo "$examplefile" | sed -e '/.*/s/asset(\(.*\))/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
output:
'1a/1b/1c.ext'
'2a/2b/2c.ext'
'3a/3b/3c.ext'
Note the wrapping of $examplefile in quotes, and the use of sed to replace the entire line with the match. If there will be other content in the file, either on the same lines as the "asset" string or in other lines with no assets at all you can refine it like this:
#!/bin/sh
examplefile="
fooasset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')bar
foobar
fooasset('3a/3b/3c.ext')bar
"
examplearr=(`echo "$examplefile" | grep asset | sed -e '/.*/s/^.*asset(\(.*\)).*$/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
and achieve the same result.
There are several ways to do this. I'd do with GNU grep with perl-compatible regex (ah, delightful line noise):
mapfile -t examplearr < <(grep -oP '(?<=[(]).*?(?=[)])' <<<"$examplefile")
for i in "${!examplearr[#]}"; do printf "%d\t%s\n" $i "${examplearr[i]}"; done
0 '1a/1b/1c.ext'
1 '2a/2b/2c.ext'
2 '3a/3b/3c.ext'
This uses the bash mapfile command to read lines from stdin and assign them to an array.
The bits you're missing from the sed command:
$examplefile is text, not a filename, so you have to send to to sed's stdin
sed's a funny little language with 1-character commands: you've given it the "a" command, which is inappropriate in this case.
you only want to output the captured parts of the matches, not every line, so you need the -n option, and you need to print somewhere: the p flag in s///p means "print the [line] if a substitution was made".
sed -n 's/asset\(([^)]*)\)/\1/p' <<<"$examplefile"
# or
echo "$examplefile" | sed -n 's/asset\(([^)]*)\)/\1/p'
Note that this returns values like ('1a/1b/1c.ext') -- with the parentheses. If you don't want them, add the -r or -E option to sed: among other things, that flips the meaning of ( and \(

Bash edit last character in file at specified line

I use a .txt filet as database like this:
is-program-installed= 0
is-program2-installed= 1
is-script3-runnig= 0
is-var5-declared= 1
But what if i uninstall program 2 and i want to set its database value to "0"?
One way to do using sed:
sed -e '/is-program2-/ s/.$/0/' -i file.txt
It works like this:
The s/.$/0/ replaces the last character with 0: the dot matches any character, and the $ matches the end of the line--hence .$ is the last character on the line.
The /is-program2-/ is a filter, so that the replacement is only executed for matching lines.
The filter pattern I used is a bit lazy: it's short but inaccurate. A longer, more strict solution would be:
sed -e '/^is-program2-installed= / s/.$/0/' -i file.txt
You can wrap a sed command (see #janos' answer) in a function for ease of use:
Define:
function markuninstalled () {
PROGRAM=${1?Usage: markuninstalled PROGRAM [FILE]}
FILE=${2:-file.txt}
sed -e "/^is-$PROGRAM-/ s/.$/0/" -i.bak $FILE
}
and then, use it like this:
markuninstalled program2
and it will modify the default file file.txt and create a copy.

script for getting extensions of a file

I need to get all the file extension types in a folder. For instance, if the directory's ls gives the following:
a.t
b.t.pg
c.bin
d.bin
e.old
f.txt
g.txt
I should get this by running the script
.t
.t.pg
.bin
.old
.txt
I have a bash shell.
Thanks a lot!
See the BashFAQ entry on ParsingLS for a description of why many of these answers are evil.
The following approach avoids this pitfall (and, by the way, completely ignores files with no extension):
shopt -s nullglob
for f in *.*; do
printf '%s\n' ".${f#*.}"
done | sort -u
Among the advantages:
Correctness: ls behaves inconsistently and can result in inappropriate results. See the link at the top.
Efficiency: Minimizes the number of subprocess invoked (only one, sort -u, and that could be removed also if we wanted to use Bash 4's associative arrays to store results)
Things that still could be improved:
Correctness: this will correctly discard newlines in filenames before the first . (which some other answers won't) -- but filenames with newlines after the first . will be treated as separate entries by sort. This could be fixed by using nulls as the delimiter, or by the aforementioned bash 4 associative-array storage approach.
try this:
ls -1 | sed 's/^[^.]*\(\..*\)$/\1/' | sort -u
ls lists files in your folder, one file per line
sed magic extracts extensions
sort -u sorts extensions and removes duplicates
sed magic reads as:
s/ / /: substitutes whatever is between first and second / by whatever is between second and third /
^: match beginning of line
[^.]: match any character that is not a dot
*: match it as many times as possible
\( and \): remember whatever is matched between these two parentheses
\.: match a dot
.: match any character
*: match it as many times as possible
$: match end of line
\1: this is what has been matched between parentheses
People are really over-complicating this - particularly the regex:
ls | grep -o "\..*" | uniq
ls - get all the files
grep -o "\..*" - -o only show the match; "\..*" match at the first "." & everything after it
uniq - don't print duplicates but keep the same order
you can also sort if you like, but sorting doesn't match the example
This is what happens when you run it:
> ls -1
a.t
a.t.pg
c.bin
d.bin
e.old
f.txt
g.txt
> ls | grep -o "\..*" | uniq
.t
.t.pg
.bin
.old
.txt

Resources