Using Bash I am extracting multiple strings from a binary file. Those strings are filenames, so only NUL and slash can not appear. I use a function that outputs those filenames to an array. I know, I can use IFS separator newline to get filenames with spaces. I hope it is possible to separate functions multiline strings with NUL to save in array, so any *nix legal filename can be worked with. If I set IFS to '' or '\0' I get some numbers instead of names. Not sure why, and maybe I have overseen something pretty basic :)
How do I achieve getting all possible filename strings including not just spaces, but newlines and other characters/byte values as well?
Here is my simplified example.
#! /bin/bash
binaryFile=$1
getBinaryList () {
fileNameAddresses=( 123 456 789 ) #Just a mock example for simplicity
for currAddr in "${fileNameAddresses[#]}"
do
fileNameStart=$((currAddr)) #Just a mock example for simplicity
fileNameLength=48 #Just a mock example for simplicity
currFileName=$( dd status=none bs=1 skip=$fileNameStart count=$fileNameLength if=$binaryFile )
printf "%s\n" "$currFileName"
done
}
IFS=$'\n'
allFileNames=($(getBinaryList $binaryFile))
echo ${#allFileNames[#]}
printf "%s\n" "${allFileNames[#]}"
Your idea is right, but with a couple of slight modifications you can achieve what you are looking for. In the getBinaryList() function instead of using printf() emitting output with newline formatting, use a NULL byte separator, i.e.
printf "%s\0" "$currFileName"
and now instead of modifying IFS to newline and slurping the result into an array. Use a command like mapfile which puts the results directly into array. The command provides an option to delimit results on the NULL byte with -d '' and to store in array specified by -t. So your result can look like
mapfile -t -d '' allFileNames < <(getBinaryList "$binaryFile")
Related
I try to solve a problem in shell.
Im trying to find a way to delete all newlines from each element of an array. I tried to do this with a for loop.
The Strings look like this (always three numbers, separated with dots)
"14.1.3\n" and I need to get rid of the newline at the end.
This is what i tried to do:
As a single-liner
for i in ${backup_versions[*]}; do backup_versions[$i]=echo "$i" | tr '\n' ' ' ; done
Easier to read
for i in ${backup_versions[*]};
do
backup_versions[$i]=echo "$i" | tr '\n' ' '
done
I think I try to reassign the element with the wrong syntax, but I tried every kind of writing i which I found or knew myself.
The deletion of the newline works just fine and just the reassigning is my Problem.
If the strings are always of that form and don't contain any whitespace or wildcard characters, you can just use the shell's word-splitting to remove extraneous whitespace characters from the values.
backup_versions=(${backup_versions[*]})
If you used mapfile to create the array, you can use the -t option to prevent it from including the newline in the value in the first place.
Use Bash's string substitution expansion ${var//old/new} to delete all newlines, and dynamically create a declaration for a new array, with elements stripped of newlines:
#!/usr/bin/env bash
backup_versions=(
$'foo\nbar\n'
$'\nbaz\ncux\n\n'
$'I have spaces\n and newlines\n'
$'It\'s a \n\n\nsingle quote and spaces\n'
$'Quoted "foo bar"\n and newline'
)
# shellcheck disable=SC2155 # Dynamically generated declaration
declare -a no_newlines="($(
printf '%q ' "${backup_versions[#]//$'\n'/}"
))"
# Debug print original array declaration
declare -p backup_versions
# Debug print the declaration of no_newlines
declare -p no_newlines
declare -a no_newlines="($(: Creates a dynamically generated declaration for the no_newlines array.
printf '%q ': Print each argument with quotes if necessary and add a trailing space.
"${backup_versions[#]//$'\n'/}": Expand each element of the backup_versions array, // replacing all $'\n' newlines by nothing to delete them.
Finally the no_newlines array will contain all entries from backup_versions, with newlines stripped-out.
Debug output match expectations:
declare -a backup_versions=([0]=$'foo\nbar\n' [1]=$'\nbaz\ncux\n\n' [2]=$'I have spaces\n and newlines\n' [3]=$'It\'s a \n\n\nsingle quote and spaces\n' [4]=$'Quoted "foo bar"\n and newline')
declare -a no_newlines=([0]="foobar" [1]="bazcux" [2]="I have spaces and newlines" [3]="It's a single quote and spaces" [4]="Quoted \"foo bar\" and newline")
You can use a modifier when expanding the array, then save the modified contents. If the elements just have a single trailing newline, use substring removal to trim it:
backup_versions=("${backup_versions[#]%$'\n'}")
(Note: when expanding an array, you should almost always use [#] instead of [*], and put double-quotes around it to avoid weird parsing. Bash doesn't generally let you combine modifiers, but you can combo them with [#] to apply the modifier to each element as it's expanded.)
If you want to remove all newlines from the elements (in case there are multiple newlines in some elements), use a substitution (with an empty replacement string) instead:
backup_versions=("${backup_versions[#]//$'\n'/}")
(But as several comments have mentioned, it'd probably be better to look at how the array's being created, and see if it's possible to just avoid putting newlines in the array in the first place.)
I have a bash script which breaks bash array into pairs, and match on either element;
declare -a arr=(
"apple" "fruit"
"cabbage" "vegetables"
)
for ((i=0; i<${#arr[#]}; i+=2)); do
echo "${arr[i]} ${arr[i+1]}"
done
So when you run this script, it prints out each 2 element from the array, like this;
# bash script
apple fruit
cabbage vegetables
and I can also choose any element I want with ${arr[i+#]}.
Now I'm trying to read this array from a separate text file, instead of inside the script since I'll be manipulating this array in the future.
I've tried this method so far, which looked pretty promising at first but didn't work at all;
filename='stuff.log'
filelines=`cat $filename`
for line in $filelines ; do
props=($line)
echo "${props[0]} ${props[1]}"
done
which should've print out the below content in the console (basically the same thing as the first script where the array is inside the script), supposedly but instead, it returned nothing.
# bash script
apple fruit
cabbage vegetables
And the inside of stuff.log is;
"apple" "fruit"
"cabbage" "vegetables"
How can I basically read the array from a separate file for the first script and also be able to manipulate the content of array file in the future?
I think, if you trust your input, you can do:
IFS=' \n' eval props=($(<stuff.log))
Eval is evil and it is there to remove leading and trailing ". And it will parse properly elements with spaces in them. We can do a little safer by reading the file into array and then removing leading and trailing ":
IFS=' \n' props=($(<stuff.log))
IFS='\n' props=($(printf "%s\n" "${props[#]}" | sed 's/^"//;s/"$//'))
Anyway I think I would hesitate to use such method in production code. Would be better to write a proper fully parser that takes " into account and reads input char by char.
If you want to read a file into an array, use mapfile or readarray commands (they are exactly the same command).
I thought setting IFS to $'\n' would help me in reading an entire file into an array, as in:
IFS=$'\n' read -r -a array < file
However, the above command only reads the first line of the file into the first element of the array, and nothing else.
Even this reads only the first line into the array:
string=$'one\ntwo\nthree'
IFS=$'\n' read -r -a array <<< "$string"
I came across other posts on this site that talk about either using mapfile -t or a read loop to read a file into an array.
Now my question is: when do I use IFS=$'\n' at all?
You are a bit confused as to what IFS is. IFS is the Internal Field Separator used by bash to perform word-splitting to split lines into words after expansion. The default value is [ \t\n] (space, tab, newline).
By reassigning IFS=$'\n', you are removing the ' \t' and telling bash to only split words on newline characters (your thinking is correct). That has the effect of allowing some line with spaces to be read into a single array element without quoting.
Where your implementation fails is in your read -r -a array < file. The -a causes words in the line to be assigned to sequential array indexes. However, you have told bash to only break on a newline (which is the whole line). Since you only call read once, only one array index is filled.
You can either do:
while IFS=$'\n' read -r line; do
array+=( $line )
done < "$filename"
(which you could do without changing IFS if you simply quoted "$line")
Or using IFS=$'\n', you could do
IFS=$'\n'
array=( $(<filename) )
or finally, you could use IFS and readarray:
readarray array <filename
Try them and let me know if you have questions.
Your second try almost works, but you have to tell read that it should not just read until newline (the default behaviour), but for example until the null string:
$ IFS=$'\n' read -a arr -d '' <<< $'a b c\nd e f\ng h i'
$ declare -p arr
declare -a arr='([0]="a b c" [1]="d e f" [2]="g h i")'
But as you pointed out, mapfile/readarray is the way to go if you have it (requires Bash 4.0 or newer):
$ mapfile -t arr <<< $'a b c\nd e f\ng h i'
$ declare -p arr
declare -a arr='([0]="a b c" [1]="d e f" [2]="g h i")'
The -t option removes the newlines from each element.
As for when you'd want to use IFS=$'\n':
As just shown, if you want to read a files into an array, one line per element, if your Bash is older than 4.0, and you don't want to use a loop
Some people promote using an IFS without a space to avoid unexpected side effects from word splitting; the proper approach in my opinion, though, is to understand word splitting and make sure to avoid it with proper quoting as desired.
I've seen IFS=$'\n' used in tab completion scripts, for example the one for cd in bash-completion: this script fiddles with paths and replaces colons with newlines, to then split them up using that IFS.
I have a file called failedfiles.txt with the following content:
failed1
failed2
failed3
I need to use grep to return the content on each line in that file, and save the output in a list to be accessed. So I want something like this:
temp_list=$(grep "[a-z]" failedfiles.txt)
However, the problem with this is that when I type
echo ${temp_list[0]}
I get the following output:
failed1 failed2 failed3
But what I want is when I do:
echo ${temp_list[0]}
to print
failed1
and when I do:
echo ${temp_list[1]}
to print
failed2
Thanks.
#devnull's helpful answer explains why your code didn't work as expected: command substitution always returns a single string (possibly composed of multiple lines).
However, simply putting (...) around a command substitution to create an array of lines will only work as expected if the lines output by the command do not have embedded spaces - otherwise, each individual (whitespace-separated) word will become its own array element.
Capturing command output lines at once, in an array:
To capture the lines output by an arbitrary command in an array, use the following:
bash < 4 (e.g., on OSX as of OS X 10.9.2): use read -a
IFS=$'\n' read -rd '' -a linesArray <<<"$(grep "[a-z]" failedfiles.txt)"
bash >= 4: use readarray:
readarray -t linesArray <<<"$(grep "[a-z]" failedfiles.txt)"
Note:
<<< initiates a so-called here-string, which pipes the string to its right (which happens to be the result of a command substitution here) into the command on the left via stdin.
While command <<< string is functionally equivalent to echo string | command in principle, the crucial difference is that the latter creates subshells, which make variable assignments in command pointless - they are localized to each subshell.
An alternative to combining here-strings with command substitution is [input] process substitution - <(...) - which, simply put, allows using a command's output as if it were an input file; the equivalent of <<<"$(command)" is < <(command).
read: -a reads into an array, and IFS=$'\n' ensures that every line is considered a separate field and thus read into its own array element; -d '' ensures that ALL lines are read at once (before breaking them into fields); -r turns interpretation of escape sequence in the input off.
readarray (also callable as mapfile) directly breaks input lines into an array of lines; -t ensures that the terminating \n is NOT included in the array elements.
Looping over command output lines:
If there is no need to capture all lines in an array at once and looping over a command's output line by line is sufficient, use the following:
while IFS= read -r line; do
# ...
done < <(grep "[a-z]" failedfiles.txt)
IFS= ensures that each line is read unmodified in terms of whitespace; remove it to have leading and trailing whitespace trimmed.
-r ensures that the lines are read 'raw' in that substrings in the input that look like escape sequences - e.g., \t - are NOT interpreted as such.
Note the use of [input] process substitution (explained above) to provide the command output as input to the read loop.
You did not create an array. What you did was Command Substitution which would simply put the output of a command into a variable.
In order to create an array, say:
temp_list=( $(grep "[a-z]" failedfiles.txt) )
You might also want to refer to Guide on Arrays.
The proper and portable way to loop over lines in a file is simply
while read -r line; do
... something with "$line"
done <failedfiles.txt
I try to write KSH script for processing a file consisting of name-value pairs, several of them on each line.
Format is:
NAME1 VALUE1,NAME2 VALUE2,NAME3 VALUE3, etc
Suppose I write:
read l
IFS=","
set -A nvls $l
echo "$nvls[2]"
This will give me second name-value pair, nice and easy. Now, suppose that the task is extended so that values could include commas. They should be escaped, like this:
NAME1 VALUE1,NAME2 VALUE2_1\,VALUE2_2,NAME3 VALUE3, etc
Obviously, my code no longer works, since "read" strips all quoting and second element of array will be just "NAME2 VALUE2_1".
I'm stuck with older ksh that does not have "read -A array". I tried various tricks with "read -r" and "eval set -A ....", to no avail. I can't use "read nvl1 nvl2 nvl3" to do unescaping and splitting inside read, since I dont know beforehand how many name-value pairs are in each line.
Does anyone have a useful trick up their sleeve for me?
PS
I know that I have do this in a nick of time in Perl, Python, even in awk. However, I have to do it in ksh (... or die trying ;)
As it often happens, I deviced an answer minutes after asking the question in public forum :(
I worked around the quoting/unquoting issue by piping the input file through the following sed script:
sed -e 's/\([^\]\),/\1\
/g;s/$/\
/
It converted the input into:
NAME1.1 VALUE1.1
NAME1.2 VALUE1.2_1\,VALUE1.2_2
NAME1.3 VALUE1.3
<empty line>
NAME2.1 VALUE2.1
<second record continues>
Now, I can parse this input like this:
while read name value ; do
echo "$name => $value"
done
Value will have its commas unquoted by "read", and I can stuff "name" and "value" in some associative array, if I like.
PS
Since I cant accept my own answer, should I delete the question, or ...?
You can also change the \, pattern to something else that is known not to appear in any of your strings, and then change it back after you've split the input into an array. You can use the ksh builtin pattern-substitution syntax to do this, you don't need to use sed or awk or anything.
read l
l=${l//\\,/!!}
IFS=","
set -A nvls $l
unset IFS
echo ${nvls[2]/!!/,}