data.txt:
hello world
goodbye mars
goodbye perl6
hello perl5
myprog.py:
my $fname = 'data.txt';
my $infile = open($fname, :r, nl => "\n\n");
for $infile.lines(nl => "\n\n") -> $para {
say $para;
say '-' x 10;
}
Actual output:
hello world
----------
goodbye mars
----------
----------
goodbye perl6
----------
back to perl5
----------
Desired output:
hello world
goodbye mars
-----------
goodbye perl6
back to perl5
-----------
...
$ perl6 -v
This is perl6 version 2015.03-21-gcfa4974 built on MoarVM version 2015.03
This appears to be a bug in Rakudo/MoarVM, going back to the fact that MoarVM expects a single grapheme as separator instead of an arbitrary string (cf syncfile.c:38, syncfile.c:119 and syncfile.c:91, which shows that the last character of the separator string is used instead of the whole string).
As a quick workaround (but beware that this reads the entire file into memory), use
$fname.IO.slurp.split("\n\n")
instead of $infile.lines().
You should also file a bug report or ask in #perl6 on Freenode if this is a known issue.
Related
I have the following challenge:
my source_file.txt contains:
track001="alpha"
some text ... but also again the string track001 without " symbol... some more text
track002="beta"
some text ... but also again the string track002 without " symbol ... some more text
track027="gamma"
some text ... but also again the string track003 without " symbol ... some more text
track...="..."
... about 30 entries.
Now, I want to
search for the string next to trackxxx=" (=> find the alpha, beta and gamma string)
afterwards provide the list to the user for further pre-processing in the terminal:
| Reference | Title | Status |
|---------- |--------| ------------------|
| 001 | alpha | [ not selected ] |
| 002 | beta | [ not selected ] |
| ... | ... | [ not selected ] |
| 027 | gamma | [ not selected ] |
type Reference number (xxx): < user prompt>
change Status (selected = 1 / not selected = 0): < user prompt >
I thought about:
to copy the file and delete all lines which do not start with trackxxx=" but I guess there is nice sed which does the magic.
I need to paste all into a matrix to ease the pre-processing
for the pre-processing I would like to keep it simple (terminal interaction) no zenity etc.. Maybe someone has an idea to make the selector operation more user friendly.
Appreciate your support, thank you!
As a partial answer, because of the request for explanation of my comments:
sed -n 's/^track\(.*\)="\([^"]*\).*/ \1 \2 /p' will give you a list of
001 alpha
002 beta
...
027 gamma
which can be fed into a for-loop in bash to do the actual processing.
sed -n will not produce output, unless a line is explicitly printed
s/pattern/replacement/ replaces the pattern by the replacement
^track matches track if it is at the beginning of a line (^)
\(.*\) creates a capture group; the \( opens the capture group and the \) closes it. The capture group contains all characters up to the next element in the pattern
-=" This is the next element in the pattern: literal ="
\([^"]*\) second capture group. All character that are not " are added to this group.
.* the rest of the line. Will most probably begin with a ", but if you forget the closing ", that's ok too.
-The replacement string \1 \2 is a combination of the two capture group, \1 for the first and \2 for the second.
p Explicitly print this line if the pattern is matched. Because of the -n, normal output is suppressed, and you will get only the explicitly printed lines.
I'm trying to print text to an output file, and I need my output file to match the proper formatting exactly.
I need to do this so that when my program's output is compared to what the proper output should be, (talking large output files here) a simple string comparison of the output files should flag them as perfectly identical.
Here is an example of the EXACT expected output:
| Item 1234 | CALCULATOR | $ 0.45 |
| Item 5678 | USA_FLAG | $ 10.99 |
| Item 9012 | WITCH_BROOM | $ 18.00 |
Notice how with this formatting there are variable amounts of spaces after each string and before the double numbers, but they still manage to line up perfectly.
So how do you output this kind of formatting?
I'm assuming there is something about fprintf() that I just don't know, but again, I don't know soooooooooooooo
You can specify flags and width in the printf format string e.g.
printf( "%10s", "test" ) will print a 10 characters and pad with spaces if the argument is shorter. e.g (dots added for spaces)
test......
You can also specify justification e.g.
printf( "%-10s", "test" )
......test
Given:
array(a1)=123
array(b1)=456
My command is:
for test in ${array[#]}; do
echo "Hello "$!test "$test" Hi"
done
The output is:
Hello test 123 Hi
Hello test 456 Hi
Expected output is:
Hello a1 123 Hi
Hello b1 456 Hi
test is a normal variable and doesn't store any reference to the array. In your case writing $!test is the same as writing ${someUndefinedVariable}test (see ✱). The undefined variable will expand to the empty string. test is a literal string.
To print the keys and values, you have to iterate over the keys and retrieve the corresponding values manually:
declare -A array
array[a1]=123
array[b1]=456
for key in "${!array[#]}"; do
echo "key=$key, value=${array[$key]}"
done
By the way, I'm suprised your command even ran without an error; a closing " is missing. You cannot nest quotation marks. After the first " the second " will end quotation:
|quoted| |quoted |started quote without end -->
| | | | |
"Hello "$!test "$test" Hi"
| | | |
|unquoted |unquoted
✱ $! is actually a special variable that contains the process number of the last background command. Since you did not start any background commands in your session $! is empty.
I have a text file of format :
hello world line 1 1234567_45674345 new line this is hello world
this is line 2 and new hello world 980765789_32345332
this is line 3 8976578_45345678 end of file
Now I have input number which is before underscore 1234567_45674345 for example in this case 1234567
I need to search this number and return the complete word i.e. 1234567_45674345
What I tried :
I used grep command but it returns complete line i.e. hello world line 1 1234567_45674345 new line this is hello world
What is the suitable way to do this ?
The file size is in order of 10MB
grep -oP '1234567[^ ]+' inputFile
1234567_45674345
In case you have input number in a variable.
var=1234567
echo "$x" |grep -oP "$var[^ ]+"
1234567_45674345
In bash, I can pass quoted arguments to a command like this:
$ printf '[%s]\n' 'hello world'
[hello world]
But I can't get it to work right if the argument is coming from a subshell:
$ cat junk
'hello world'
$ printf '[%s]\n' $(cat junk)
['hello]
[world']
Or:
$ cat junk
hello world
$ printf '[%s]\n' $(cat junk)
[hello]
[world]
Or:
$ cat junk
hello\ world
$ printf '[%s]\n' $(cat junk)
[hello\]
[world]
How do I do this correctly?
EDIT: The solution also needs to handle this case:
$ printf '[%s]\n' abc 'hello world'
[abc]
[hello world]
So this solution doesn't work:
$ cat junk
abc 'hello world'
$ printf '[%s]\n' "$(cat junk)"
[abc 'hello world']
The question at Bash quoting issue has been suggested as a duplicate. However, it isn't clear how to apply its accepted answer; the following fails:
$ cat junk
abc 'hello world'
$ FOO=($(cat junk))
$ printf '[%s]\n' "${FOO[#]}"
[abc]
['hello]
[world']
There's no one good solution here, but you can choose between bad ones.
This answer requires changing the file format:
Using a NUL-delimited stream for the file is the safest approach; literally any C string (thus, any string bash can store as an array element) can be written and read in this manner.
# write file as a NUL-delimited stream
printf '%s\0' abc 'hello world' >junk
# read file as an array
foo=( )
while IFS= read -r -d '' entry; do
foo+=( "$entry" )
done <junk
If valid arguments can't contain newlines, you may wish to leave out the -d '' on the reading side and change the \0 on the writing side to \n to use newlines instead of NULs. Note that UNIX filenames can contain newlines, so if your possible arguments include filenames, this approach would be unwise.
This answer almost implements shell-like parsing semantics:
foo=( )
while IFS= read -r -d '' entry; do
foo+=( "$entry" )
done < <(xargs printf '%s\0' <junk)
xargs has some corner cases surrounding multi-line strings where its parsing isn't quite identical to how a shell does. It's a 99% solution, however.
This answer requires a Python interpreter:
The Python standard library shlex module supports POSIX-compliant string tokenization which is more true to the standard than that implemented by xargs. Note that bash/ksh extensions such as $'foo' are not honored.
shlex_split() {
python -c '
import shlex, sys
for item in shlex.split(sys.stdin.read()):
sys.stdout.write(item + "\0")
'
}
while IFS= read -r -d '' entry; do
foo+=( "$entry" )
done < <(shlex_split <junk)
These answers pose a security risk:
...specifically, if the contents of junk can be written to contain shell-sensitive code (like $(rm -rf /)), you don't want to use either of them:
# use declare
declare "foo=($(cat junk))"
# ...or use eval directly
eval "foo=( $(cat junk) )"
If you want to be sure that foo is written in a way that's safe to read in this way, and you control the code that writes to it, consider:
# write foo array to junk in an eval-safe way, if it contains at least one element
{ printf '%q ' "${foo[#]}" && printf '\n'; } >junk;
Alternately, you could use:
# write a command which, when evaluated, will recreate the variable foo
declare -p foo >junk
and:
# run all commands in the file junk
source junk