Add literal space character to pictured numeric output - string-formatting

After about several hours of trial and error, I couldn't figure how to do it... Here's the code I'm working on:
: format
( n ds -- )
>r cr .s
dup cr .s
>r cr .s
abs cr .s
s>d cr .s
<# [char] bl hold #s r> sign #> cr .s
r# cr .s
dup cr .s
c# cr .s
dup >r cr .s
chars + cr .s
char+ cr .s
swap cr .s
dup cr .s
>r cr .s
cmove cr .s
r> r> + cr .s
r> cr .s
c!
;
As an aside... isn't there an easier way to have printed output, something similar to printf would be great. Another possibility would be to specify space character as the first character of a s" " kind of string.
EDIT:
I found that I can do: s\" \040 test" in Gforth (though the manual says it's not standard). And s\" \x20 test" which is probably standard, but I don't quite understand where the part of the sentence which says whether it is standard or not relates. Still, I'd be happy to know how to combine it with pictured number output.
EDIT2
This is how I'd expect it be used:
create test 256 allot
s" prefix " test place
123 test format
s" suffix" test place+
test count type \ prefix 123 suffix

I think for this example you don't need to add the space within the <# #> stuff. You can define strings with leading or trailing spaces with s".
So if you start with
\ Push addresses and lengths for the prefix and the number
s" prefix " \ -- addr u
123 s>d <# #s #> \ addr u -- addr u addr2 u2
The word that you want is something that concatenates them, for example:
: concatenate
\ Moves the string addr2 u2 to the end of the string addr u
>r >r \ addr u addr2 u2 -- addr u
dup >r over r> + r> \ addr u -- addr u addr+u addr2
swap r# \ addr u -- addr u addr2 addr+u u2
cmove r> + ; \ addr u addr2 addr+u u2 -- addr u+u2
So if you call this and output the resulting string like this:
concatenate type
The output will be "prefix 123"
You could then apply the same word to the strings "prefix 123" and " suffix".
This doesn't use exactly the same memory locations as your example but it could be adapted, and just was the easiest way that I could demonstrate it.
In response to the comment, you seem to be pretty close to embedding characters in pictured output, I think you just need to remove the [char] e.g.
123 s>d <# # bl hold # bl hold # #>
Should generate a string like "1 2 3"

Related

bash array into string pattern

How do I change a bash array into a string with this pattern? Let's say I have arr=(a b c d) and I want to change to this pattern
'a' + 'b' + 'c' + 'd' with the white space in between.
PS-I figured out this pattern 'a'+'b'+'c'+'d' but not sure how to put " + " instead of just "+" in between.
In pure bash without resorting to an external command and without creating a subshell, using bash built-in printf's implicit loop with -v option (which assigns the output to the variable rather than printing it):
printf -v str " + '%s'" "${arr[#]}"
str=${str:3} # to strip off leading ' + '
echo "$str"
This will also work with array elements containing blank characters (try with arr=(a b c d "x y") ). This solution assumes arr is not an empty array.
Like this:
#!/bin/bash
arr=(a b c d)
str="${arr[#]}"
str=${str// / + } # Parameter Expansion
sed "s/[a-z]/'&'/g" <<< "$str"
Output
'a' + 'b' + 'c' + 'd'
Check
See: http://mywiki.wooledge.org/BashFAQ/073 and "Parameter Expansion" in man bash. Also see http://wiki.bash-hackers.org/syntax/pe.
Here’s a version in pure Bash that also tolerates whitespace in arr members.
arr=(a b c d 'e f' 'g h')
aux=("${arr[#]/#/ \'}") # add prefix " '"
aux=("${aux[#]/%/\' }") # add suffix "' "
str="$(IFS=+ # join with "+"
printf '%s' "${aux[*]}")"
str="${str:1:${#str} - 2}" # trim spaces
printf '%s\n' "$str"

concatenate within loop using sed

I have two types of files (A.n and B). A.n files are named A.1, A.2, A.3 etc.
In B file there is a word like 'Element.xyz' which I want to replace by 'Element_n.xyz' with all other things unchanged and append below the A.n files. The two added A.n and B files should be named as
final.n i.e. final.1, final.2, final.3 etc
So Final.1 file should look like this:
A.1
B file with Element.txt is replaced by Element_1.txt
I tried with this code but failed:
for f in A.*;
sed 's/Element.txt/Element_$f.txt/g' B >> tt;
cat A.$f tt >> final_$f.txt ;
done
Your code would append more and more data to the file tt. Your code would put the output from the B file before, not after, the text from the A file. Furthermore, the single quotes prevent the variable from being visible to sed. If I understand your question correctly, you are looking simply for
for f in A.*; do
n=${f#A.}
( cat "$f"
sed "s/Element.txt/Element_$n.txt/g" B ) >"final_$n".txt
done
The parentheses group the two commands so that their output can be redirected together at once. The file name in $f contains the A. part, so we chop it off with a parameter expansion and store that in $n. The argument to cat should obviously be the file name itself.

Printing in Tabular format in TCL/PERL

I have a script in tcl in which a variable gets a collection of data in every loop and appends in a file. Suppose in loop1 ,
$var = {xy} {ty} {po} {iu} {ii}
and in loop2
$var = {a} {b} {c} {d1} {d2} {e3}
Now in a file f.txt the variable in dumped. Like puts $file $var. And in file it comes like this:
Line number 1: {xy} {ty} {po} {iu} {ii}
Line number 2: {a} {b} {c} {d1} {d2}
I want to print them finally in a file in tabular format. Like below:
xy a
ty b
po c
iu d1
ii d2
First, read the file in and extract the words on the first two lines:
set f [open "f.txt"]
set words1 [regexp -all -inline {\S+} [gets $f]]
set words2 [regexp -all -inline {\S+} [gets $f]]
close $f
The trick here is that regexp -all -inline returns all matching substrings, and \S+ selects non-whitespace character sequences.
Then, because we're producing tabular output, we need to measure the maximum size of the items in the first list. We might as well measure the second list at the same time.
set len1 [tcl::mathfunc::max {*}[lmap w $words1 {string length $w}]]
set len2 [tcl::mathfunc::max {*}[lmap w $words2 {string length $w}]]
The lmap applies a string length to each word, and then we find the maximum of them. {*} substitutes the list (of word lengths) as multiple arguments.
Now, we can iterate over the two lists and produce formatted output:
foreach w1 $words1 w2 $words2 {
puts [format "%-*s %-*s" $len1 $w1 $len2 $w2]
}
The format sequence %-*s consumes two arguments, one is the length of the field, and the other is the string to put in that field. It left-aligns the value within the field, and pads on the right with spaces. Without the - it would right-align; that's more useful for integers. You could instead use tab characters to separate, which usually works well if the words are short, but isn't so good once you get a wider mix of lengths.
If you're looking to produce an actual Tab-Separated Values file, the csv package in Tcllib will generate those fine with the right (obvious!) options.
Try this:
$ perl -anE 'push #{$vars[$_]}, ($F[$_] =~ s/^[{]|[}]$//gr) for 0.. $#F; END {say join "\t", #$_ for #vars}' f.txt
xy a
ty b
po c
iu d1
ii d2
command line switches:
-a : Turn on autosplit on white space to #F array.
-n : Loop over lines in input file, setting the #F array to the words on the current line.
-E : Execute the following argument as a one-liner
Removing surrounding braces from each words:
$F[$_] =~ s/^[{]|[}]$//gr
g : global substitution (we want to remove both { and })
r : non destructive operation, returns the result of the substitution instead of modifying #F

How do I appropriately pass associated array values through SED?

I'm creating a Caesar cipher that substitutes the letter in a word with the matching letter if the alphabet were reversed. The sample is "abcdefghijklmnopqrstuvwxyz". The output of the following code produces "abcdefghijklmmlkjihgfedcba". The desired output is the alphabet in reverse, but once the editor hits the midpoint, it goes back in reverse instead of going on through to the end.
declare -A origin
x=({a..z})
z=({z..a})
for i in {0..25}
do
origin[${x[i]}]=${z[i]}
done
for x in "${!origin[#]}"
do
sed -i 's/'${x}'/'${origin[${x}]}'/g' test.txt
done
Don't forget character indexes in bash. In your script, there is no need for the first 2 indexed arrays x & y. Example:
declare -A origin
x=abcdefghijklmnopqrstuvwxyz
z=zyxwvutsrqponmlkjihgfedcba
for i in {0..25}; do
origin[${x:i:1}]=${z:i:1}
done
Nothing like substituting origin[{a..z}] for {z..a} and getting a familiar looking result back? Look for example at the first and last iterations only. On the first iteration, you substitute all a's with z's. Then on the last iteration you again substitute all the z's (including those your previously replaced a->z in the first iteration) with a's again -- effectively undoing your changes.
A better example is to look at the midpoint of the alphabet m->n.
x=abcdefghijklmnopqrstuvwxyz
z=zyxwvutsrqponmlkjihgfedcba
||
When your iteration reaches m, you substitute all m's with n's. Then the very next iteration, you substitute n's with m's.
You can see how this happens to look like only half of the substitutions are being effected. After you reach the midpoint in origin, any substitutions only occur once since you are no longer encountering letters you have already substituted.
The solution using tr previously posted looks like one of your best options.
The aproach you used to pass the associative array is correct, but the logic of the Ceasar cipher is wrong.
Because for each iteration of the for loop the sed command changes some characters in the input file. This character can be the character originally in the input file, or a character which was previously changed by an earlier sed. So the for loop in effect would do multiple conversion than doing a single conversion.
For example
Consider an input file
$ cat test
a z
Now a small formated script would be
for x in "${origin[#]}"; do sed -i "s/$x/${origin[$x]}/g" test; echo "$x ${origin[$x]}"; done;
z a
y b
x c
w d
v e
..
..
Here in the first iteration, the sed would change the z to a. Now the input file would be
$cat test
a a
Now at the 25th iteration the $x will be a, Which will convert both the as in the input to z
$ cat test
z z
Alternate solution
An alternate solution can be written using tr as
$ a=$(echo {z..a} | tr -d " ")
$ b=$(echo {a..z} | tr -d " ")
$ echo {a..z} | tr $b $a
z y x w v u t s r q p o n m l k j i h g f e d c b a
Why does it work?
Here the characters are read from the input and changed with corresponding character in the tr argument, This ensures that a single character is changed only once.
I've found a solution to the problem. Thank you both for your help!
#!/bin/bash
#Retrieve the desired shift from user
echo "What number do you want to use for the shift?"
read num
#Create an array of all letters
x=({a..z})
#Take user input and use to create the cipher array
case "$num" in
0)
y=({a..z})
;;
1)
y=({{b..z},a})
;;
2)
y=({{c..z},a,b})
;;
3)
y=({{d..z},a,b,c})
;;
4)
y=({{e..z},a,b,c,d})
;;
5)
y=({{f..z},{a..e}})
;;
6)
y=({{g..z},{a..f}})
;;
7)
y=({{h..z},{a..g}})
;;
8)
y=({{i..z},{a..h}})
;;
9)
y=({{j..z},{a..i}})
;;
10)
y=({{k..z},{a..j}})
;;
11)
y=({{l..z},{a..k}})
;;
12)
y=({{m..z},{a..l}})
;;
13)
y=({{n..z},{a..m}})
;;
14)
y=({{o..z},{a..n}})
;;
15)
y=({{p..z},{a..o}})
;;
16)
y=({{q..z},{a..p}})
;;
17)
y=({{r..z},{a..q}})
;;
18)
y=({{s..z},{a..r}})
;;
19)
y=({{t..z},{a..s}})
;;
20)
y=({{u..z},{a..t}})
;;
21)
y=({{v..z},{a..u}})
;;
22)
y=({{w..z},{a..v}})
;;
23)
y=({{x..z},{a..w}})
;;
24)
y=({{y..z},{a..x}})
;;
25)
y=({{z..z},{a..y}})
;;
*)
echo "Sorry, you must use a shift from 0 to 25."
;;
esac
#create the string variables for manipulation
fromset=""
toset=""
#place the alphabetic arrays into the atring variables
for i in {0..25}
do
fromset="$fromset${x[i]}"
toset="$toset${y[i]}"
done
#Use sed text transformations to alter given files
sed "y/$fromset/$toset/" original.txt > encoded.txt
sed "y/$toset/$fromset/" encoded.txt > decoded.txt

Knowing the size of a C function in the compiled objectfile

It is easy to get the starting address of a function in C, but not its size. So I am currently doing an "nm" over the object file in order to locate my function and THEN locate the starting address of the next function. I need to do the "nm" because compiler could (and actually do, in my case) reorder functions, so source order can be different of object order.
I wonder if there are other ways of doing this. For example, instructing the compiler to preserve source code order in the object file, etc. Maybe some ELF magic?
My compilers are GCC, CLANG and Sun Studio. Platform: Solaris and derivatives, MacOSX, FreeBSD. To expand in the future.
I have found that the output of objdump -t xxx will give definitive function size/length values for program and object files (.o).
For example: (From one of my projects)
objdump -t emma | grep " F .text"
0000000000401674 l F .text 0000000000000376 parse_program_header
00000000004027ce l F .text 0000000000000157 create_segment
00000000004019ea l F .text 000000000000050c parse_section_header
0000000000402660 l F .text 000000000000016e create_section
0000000000401ef6 l F .text 000000000000000a parse_symbol_section
000000000040252c l F .text 0000000000000134 create_symbol
00000000004032e0 g F .text 0000000000000002 __libc_csu_fini
0000000000402240 g F .text 000000000000002e emma_segment_count
00000000004022f1 g F .text 0000000000000055 emma_get_symbol
00000000004021bd g F .text 000000000000002e emma_section_count
0000000000402346 g F .text 00000000000001e6 emma_close
0000000000401f00 g F .text 000000000000002f emma_init
0000000000403270 g F .text 0000000000000065 __libc_csu_init
0000000000400c20 g F .text 0000000000000060 estr
00000000004022c3 g F .text 000000000000002e emma_symbol_count
0000000000400b10 g F .text 0000000000000000 _start
0000000000402925 g F .text 000000000000074f main
0000000000401f2f g F .text 000000000000028e emma_open
I've pruned the list a bit, it was lengthy. You can see that the 5th column (the second wide column with lots of zeros....) gives a length value for every function. main is 0x74f bytes long, emma_close is 0x1e6, parse_symbol_section is a paltry 0x0a bytes... 10 bytes! (wait... is that a stub?)
Additionally, I grep'd for just the 'F'unctions in the .text section, thus limiting the list further. The -t option to objdump shows only the symbol tables, so it omits quite a bit of other information not particularly useful towards function length gathering.
I suppose you could use it like this:
objdump -t MYPROG | grep "MYFUNCTION$" | awk '{print "0x" $(NF-1)}' | xargs -I{} -- python -c 'print {}'
An example:
00000000004019ea l F .text 000000000000050c parse_section_header
$ objdump -t emma | grep "parse_section_header$" | awk '{print "0x" $(NF-1)}' | xargs -I{} -- python -c 'print {}'
1292
Checks out, since 0x50c == 1292.
I used $(NF-1) to grab the column in awk since the second field can vary in content and spaces depending on the identifiers relevant to the symbol involved. Also, note the trailing $ in the grep, causing main to find the main function, not the entry with main.c as its name.
The xargs -I{} -- python -c 'print {}' bit is to convert the value from hex to decimal. If anyone can think of an easier way, please chime in. (You can see where awk is sneaking the 0x prefix in there).
Ah, I just remembered that I have an alias for objdump which presets the demangle option for objdump. It'll make things easier to match if you add --demangle to the objdump invocation. (I also use --wide, much easier to read, but doesn't affect this particular output).
This works on any ELF object, library, program, object file, as long as it's NOT stripped. (I tested with and without debugging symbols too)
Hope this helps.
(I looked, parse_symbol_section IS a stub.)
Here is an all awk answer to this question to see size of all functions in certain section:
# call objdump with -t to get list of symbols
# awk filters out all the columns which are in text section
# awk sums the values in 5th column (prefixed with 0x as they are considered hex and then converted to dec with strtonum function)
objdump -t MYPROG | awk -F ' ' '($4 == ".text") {sum += strtonum("0x"$5)} END {print sum}'
And here is if you want to see only certain functions from certain section
# awk filters out all the columns which are in rom section and all function names which have anywhere in name funcname
# (we convert to lowercase the value in column 6 to avoid case sensitive regex)
# awk sums the values in 5th column (prefixed with 0x as they are considered hex and then converted to dec with strtonum function)
objdump -t MYPROG | awk -F ' ' '($4 == ".rom") && (tolower($6) ~ /_*funcname*/) {sum += strtonum("0x"$5)} END {print sum}'

Resources