Remove spaces from all elements of an array - arrays

I've sample file named 'test.in' like below, with leading and trailing spaces:
Hi | hello how | are you?
I need | to remove | leading & trailing
spaces | where ever | it's located
I need to put the elements separated by "|" in array with "\n" as the main delimiter.
I want that each element in array shouldn't have leading or trailing spaces, only spaces in between characters are allowed. I'm using a sample code to test the results before I put the code is my primary deployment.
Portion of sample script:
OIFS="$IFS"
IFS=$'\n'
while read LINE
do
IFS='|'
my_tmpary=($LINE)
echo ${my_tmpary[#]}
my_ary=`echo ${my_tmpary[#]} | awk '$1=$1'`
echo ${my_ary[#]}
done < test.in
I would like NOT to use a loop to clean up the extra spaces.
I did trial & error methods using sed, awk, tr, but it's not a success for me yet.
my_ary[#] should be like this
Hi hello how are you?
I need to remove leading & trailing
spaces where ever it's located

You can (ab)use the fact that read will remove leading and trailing spaces when IFS is default:
while read -r line; do
printf "%s\n" "$line" # Leading and trailing spaces are removed.
done < test.in
Alternative you can sed for such task:
sed 's/^[[:space:]]*\|[[:space:]]*$//g' test.in

sed -E 's/[\t]{1,}/ /g;s/^ *| *$//g;s/[ ]{2,}/ /g;s/ *\| */ /g;' test.in
should do it. So :
$ my_ary=$(sed -E 's/[\t]{1,}/ /g;s/^ *| *$//g;s/[ ]{2,}/ /g;s/ *\| */ /g;' test.in)
$ echo "$my_ary"
Hi hello how are you?
I need to remove leading & trailing
spaces where ever it's located
This lines has many tabs
solves your problem.
Notes
1. s/[\t]{1,}/ /g converts the tabs ie \t to whitespaces.
2. s/^ *| *$//g removes the leading & trailing whitespaces.
3. s/[ ]{2,}/ /g squeezes multiple whitespaces to one.
4. s/ *\| */|/g removes the spaces around |
5. The -E enables the use of extended regex with sed.

In AWK:
awk '{gsub(/^ *| *$|/,"",$0); gsub(/ *\| *| +/," ",$0); print $0}' test.in
gsub(/^ *| *$|/,"",$0) remove leading and trailing space,
gsub(/ *\| *| +/," ",$0) replace space-pipe-space combos and multiple spaces with a single space,
print $0 print the whole record. $0 could be omitted like #mona_sax commented but for the sake of clarity I left it in code.
Surely it could be looped and each pipe delimited field trimmed separately:
awk -F\| -v OFS=" " ' # set input field separator to "|" and output separator to " "
function trim(str) {
gsub(/^ *| *$/,"",str); # remove leading and trailing space from the field
gsub(/ +/," ",str); # mind those multiple spaces as well
return str # return cleaned field
}
{
for(i=1;i<=NF;i++) # loop thru all pipe separated fields
printf "%s%s", trim($i),(i<NF?OFS:ORS) # print field and OFS or
} # output record separator in the end of record
' test.in
.

Related

how to add \n before 4th pipe and after last double Quotes in a file in unix

I have a line in a file. like:
1|4|ab|"abnchf "dnvjnkjf" fdvjnfkjnv" 2|12|df|"dskfnkfv "A"
I want to break the into two rows by adding \n at before 4th pipe and after last double quotes.
it should be like:
1|4|ab|"abnchf "dnvjnkjf" fdvjnfkjnv"
2|12|df|"dskfnkfv "A"
i have tried sed command but its not working
sed 's/\(|[^|]*\)(|[^|]*\)(|[^|]*\)|/\1\n|/g'
You may use
sed 's/\([^|]*|\)\{3\}[^|]* /&\n/' file > newfile
See the online demo
Details
\([^|]*|\)\{3\} - three consecutve occurrences of
[^|]* - 0+ chars other than |
| - a pipe symbol
[^|]* - 0+ chars other than |
- a space
The replacement pattern is &\n, the whole match (&) and a newline (\n).
The replacement is only done once per line since I removed the g option.
To avoid overescaping, you may use a POSIX ERE based sed:
sed -E 's/([^|]*\|){3}[^|]* /&\n/' file > newfile
where you do not need to escape capturing parentheses and range/interval quantifier braces (but you have to escape a literal | char).
This might work for you (GNU sed):
sed 's/[^ |]*|/\n&/4' file
Insert a newline before the fourth field delimited by |.

Bash: set IFS to Space after specific character only?

I'm using IFS=', ' to split a string of comma-delimited text into an array. The problem is that occasionally one of the comma-delimited items contains a space following a :. The resulting array contains that item as two separate array elements. Is it possible to set IFS to only split ', ' and ignore a comma-delimited item that contains ': ' (or any other character for that matter)?
See the comma-delimited string returned from the first command below, note the second item has the :. See the MarkerNames[1] and MarkerNames[2] to see the unwanted split in the second command below.
$ exiftool -s3 -TracksMarkersName audioFile.wav
Marker1, Tempo: 120.0, Silence, Marker2, Silence.1, Marker3, Silence.2, Marker4, Silence.3, Marker5
$ IFS=', ' read -r -a MarkerNames <<< $(exiftool -s3 -TracksMarkersName audioFile.wav)
$ declare -p MarkerNames
declare -a MarkerNames='([0]="Marker1" [1]="Tempo:" [2]="120.0" [3]="Silence" [4]="Marker2" [5]="Silence.1" [6]="Marker3" [7]="Silence.2" [8]="Marker4" [9]="Silence.3" [10]="Marker5")'
IFS contains an enumeration of the characters which each can be a field separator. So ", " says "any run of spaces or commas separates my fields".
The simplest workaround I think would be to preprocess the output so you get the breaks where you want them.
IFS='~' MarkerNames=($(exiftool -s3 -TracksMarkersName audioFile.wav | sed 's/, /~/g'))
This of course requires you to find another IFS value which doesn't occur in your data. If Bash 4+ is available, maybe use a newline and readarray.
You could split on commas and remove the leading / trailing spaces afterwards:
IFS=',' read -r -a MarkerNames <<< $(exiftool -s3 -TracksMarkersName audioFile.wav)
shopt -s extglob # Needed for extended glob
MarkerNames=( "${MarkerNames[#]/#*( )}" ) # Remove leading spaces
MarkerNames=( "${MarkerNames[#]/%*( )}" ) # Remove trailing spaces

Spaces in array content getting broken with grep

I am using array to tackle with spaces in line of my file. But when i am using grep to filter with value of array it is breaking because of spaces.
For example my line is as per below
bbbh.cone.abc.com:/home 'bbbh.cone.abc.com
As it has spaces i am using array as per below.
object1=$(echo "$line" | awk '{print $1}' )
object2=$(echo "$line" | awk '{print $2}' )
object3=$(echo "$line" | awk '{print $3}' )
object4=$(echo "$line" | awk '{print $4}' )
hiteshcharry=("$object1" "$object2" "$object3" "$object4")
grep "${hiteshcharry[#]}" <filename>
It give me error because of spaces.
Below is the example.
I have below line in my file.
st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space 'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'
So i have 2 spaces in my above line. I have written my script in such way so that it can handle a line with maximum 4 spaces.
When i am running below command
omnidb -session "$sessionid" -detail | grep "${hiteshcharry[#]}"
it give me error because of spaces. However when i print the value of array it show me correct value.
Example : -
one of line from my file is as below( it has 2 spaces)
st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space 'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'
I am putting this value in my array named as hiteshcharry. when i am running below command
omnidb -session "$sessionid" -detail | grep "${hiteshcharry[#]}"
It is giving me error because of spaces in value of array. In output it should filter the line having value equal to array named hiteshcharry.
I hope this is clear now.
Output of omnidb command is in picture. So i want to grep the lines having
"st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space
'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'" from
output of omnidb command which is in picture
enter image description here
Thanks. i have added declare -p hiteshcharry and it start printing the each elements of array. But i am error shown in picture .
enter image description here
When you pass your array to grep through "${array[#]}", grep will see each array element as a separate argument. So, the first element would become the pattern to search for, and the second element onwards would become the file names to be searched on. Obviously, that's not what you want.
You can use process substitution to make grep match the strings contained in your array, like this:
omnidb -session "$sessionid" -detail | grep -Fxf <(printf '%s\n' "${hiteshcharry[#]}")
printf will print your array elements one line per element
grep -Fxf treats the about output as a file containing strings to be searched (-F option treats them as strings, not patterns, -x matches the whole line of omnidb output, preventing any partial matches)

Bash Convert text string into array with multiple \r\n as field seperator

I have a windows text file in the format:
line\r\n
line\r\n
line\r\n
r\n
line\r\n
line\r\n
line\r\n
r\n
...
I want to put this textfile into an array where the field seperator is \r\n\r\n - I did search for an answer but nothing I found and tried did work . awk for example is too complex for me and FS= did not work as I expected.
Commands to read arrays in bash can (as far as I know) only use single characters as a field separator, not complete strings like \r\n\r\n.
Workaround
First replace the field separator \r\n\r\n with a single char which is not used in the string to be splitted. I found \x1e (the ASCII control character »Record Separator«) to work out quite well.
Then read the array using the new (one character) field separator.
The field separator will always be removed when reading something to an array. But you can append the separator to each field.
Here is a pure bash solution to read the file file into the array array:
IFS=$'\x1e'
filecontent="$(< file)"
array=(${filecontent//$'\r\n\r\n'/$'\x1e'})
array=("${array[#]/%/$'\r\n\r\n'}")
IFS=$'\x1e' sets bash's field separator which is used to split strings into arrays. Depending on your script you may want to restore the old IFS afterwards (default is IFS=$' \t\n').
Results
For file
A B C\r\n
D E F\r\n
\r\n
G H I\r\n
\r\n
the resulting array will have two entries:
${array[0]}
A B C\r\n
D E F\r\n
\r\n
${array[1]}
G H I\r\n
\r\n
Known Problems
IFS at the beginning and end of the string will be trimmed. Repeated IFS will be squeezed. The file \r\n\r\n will result in an array without entries. Empty entries cannot be created.
\r\n\r\n is appended to all entries in all cases. The file A\r\n\r\nB will result an array with the two entries A\r\n\r\n and B\r\n\r\n.
In Linux all lines of files are terminated with \n.
So your problem is not the \r\n , it is just the \r. So just remove it:
$ tr -d '\r' <file >newfile
To verify that \r is removed you can do:
$ head -n2 newfile |od -t x1c
This will get the first two lines of the new file and the od tool will dump / convert those lines in ascii hex codes. In ascii hex \r is \x0d and \n is \x0a.
Once you have removed the \r from your file you can do anything you want.
You can use all linux tools (including awk) straight forward without special settings.
To built an array you can use:
$ while read -r line;do data+=("$line");done <newfile
If you want to skip blank lines , this one is enough:
$ while read -r line;do [[ "$line" == "" ]] && continue;data+=("$line") ;done <file1
You can offcourse combine array creation with removal of the \r on-the-fly, without modifying your existed file like this ( See online testing here. )
while read -r line;do [[ "$line" == "" ]] && continue;data+=("$line") ;done < <(tr -d '\r' <file1)
To see what is inside array "data" just use $ declare -p data
PS: By the way using awk -v RS="\r\n" '{you awk code here}' should be enough even to read the initial file in awk as well. RS = Record (lines) Separator
I made this script in pure bash, even if the answer from socowi is pure bash too:
exec < filern.txt
declare -a array
acc=""
lineno=0
cr=$(echo -en "\r")
while read line; do
line=${line%$cr}
if [ -z "$line" ]; then
let lineno=$lineno+1
array[$lineno]=$acc
acc=""
else
[ ! -z "$acc" ] && acc="$acc--" # you can use any separator here
acc="$acc$line"
fi
done
echo "Read file in array:"
for ((i=1; i<= ${#array[#]}; i++)) do
printf "%3.3d |%s|\n" $i "${array[$i]}"
done
It reads a "real" line of input at a time, and strips the trailing \r.
At this point, a sequence \r\n\r\n turns into an empty line, so that is used to assign the array elements one after the other.
The output from the example file is:
Read file in array:
001 |line--line--line|
002 |line--line--line|
The separator could also be a \r, or whatever. I coudn't find a way to clear the trailing \r with the command line=${line% ?? }, so I used a variable. The same trick can be used to add "strange" separator to the variable ACC. I hope it helps.

Remove all vowels from a file name using shell script

Current code:
find . -depth | \
while read LONG; do
SHORT=$( basename "$LONG" | tr '[aeiou]' '[ ]' )
DIR=$( dirname "$LONG" )
if [ "${LONG}" != "${DIR}/${SHORT}" ]; then
mv "${LONG}" "${DIR}/${SHORT}"
fi
done
So if I have files like aaa abc bdf I get the files ' ' ' bc' 'bdf'
The way I want this to work is to return 'aaa' 'bc' bdf'.
(Completly remove the a from the second file and if all the characters (excluding the file extension) are vowels, ignore it.
I think the two problems with your solution are:
You're substituting vowels for a space. Shouldn't you substitute an empty string?
Then you need to test if SHORT is empty. If it is, discard it, perhaps by assigning SHORT=LONG.
Remove all vowels:
tr -d aeiou
Ignore if basename (excluding the file extension) is only vowels:
case $SHORT in ''|.*) continue;; esac

Resources