How do I implement an array of strings? - arrays

I tried to implement a word that produces a string from an array when given a number on the stack in Forth.
My first naive attempt was:
create myarray s" Alpha" , s" Beta" , s" Charlie" ,
This was accepted, but it did not work as expected — myarray # type produces inconsistent output (instead of my naive expectation that it might print "Alpha").
When searching the web, I found in Gforth documentation that a string created with s" has a limited lifetime which means that my ansatz is bound to fail from the beginning. On the other hand, even arrays of regular objects seem to be not standardized according to Arrays in Forth section in Len's Forth Tutorial.
<Update> Apparently, this is not a trivial problem with Forth. There are libraries on the web that implement missing string functionality: FFL (str module) and String Functions by Bernd Paysan. This is a good starting point, although it still requires work to go from there to an array of strings. </Update>
So how can I implement a word that returns a string from a given array?

To address parts of your code, s" leaves addr u on the stack, an address and the length of the string. , only stores one value so you won't get the desired results that way. 2, might do it as that would store both of the stack items that represent the string. Once you have done that you need to get both values back too so 2# is what you want.
My rewrite would look like this:
create myarray s" Alpha" 2, s" Beta" 2, s" Charlie" 2,
\ Test
myarray 2# type Alpha **ok**
Getting at the other elements of your array is a bit trickier. When you type myarray you get the address of the start of the data in that dictionary entry, and you can then use 2# to get the the things that the first two addresses point to (which are the address and length of "Alpha"). If you want "Beta you need the next pair of addresses. So you can use
myarray 2 cells + \ increment the address by two cells
To get the addresses that point to "Beta" and so on. So in order to access "Beta" you would enter
myarray 2 cells + 2# type Beta **ok**
I have tested this with gforth and it seems to all work, although I am not sure how to rigorously test for persistence.
Your word would need to be able to do the address incrementing based on what is on the stack to start with. You might want to get into some more create does> stuff. I can give some pointers but I don't want to spoil the fun of discovery.
If I am skipping too many details of what this actually means just say, and I will try again.
Maybe this is too crude, but I had a go at making a "string type" of sorts a while ago.
: string ( addr u "name" -- )
create 2, \ add address and length to dict entry "name"
does> dup cell+ # swap # ; \ push addr u
\ Example
s" Some Words" string words **ok**
words type Some Words **ok**
It defines a word with a name of your choosing (in this case "words") that will push length and start address of your string (in this case "some words") when it is interpreted. As far as I know when the string is in a definition like this it is persistent.
This doesn't answer you question fully, but it might help.
I have had another go at a persistent string, this one definitely allots memory within a dictionary entry and will be safe as long as that word exists. Before the string "type" only stored the address and length that s" created which is only any good until something else writes over that region of memory. This now copies the string from where s" creates it into a dictionary item called "name" where it is guaranteed to last as long as "name" itself.
: string ( addr u "name" -- )
create \ create dict entry called "name"
dup >r here >r \ keep copies of string length and start of "name"'s memory
dup 2 cells + allot \ allot memory for the number of chars/bytes of the string plus 2
\ for the new addr u
r# 2 cells + \ Get the address two cells from the start the space for "name"
swap cmove \ copy the string at addr u into the alloted space for "name"
\ Now "name" looks like this: "name" -blank1- -blank2- "the text of the string at addr u"
\ blank1 should be the address of the start of the the text = addr2 and blank2 should be u
r# dup 2 cells + swap ! \ get the address of blank1, copy it, increment by 2 to get addr2
\ and then store that in blank1
r> cell+ r> swap ! \ get address of blank1, increment to get address of blank2, then get u and
\ store it in blank2
\ Now "name" looks like this: "name" addr2 u "the text of the string at addr u"
does> dup # swap cell+ # ; \ push addr2 u
For amusement, I thought I might show how little sense this makes without helpful formatting
: string-no-comments ( addr u "name" -- )
create dup >r here >r dup 2 cells + allot r#
2 cells + swap cmove r# dup 2 cells + swap !
r> cell+ r> swap ! does> dup # swap cell+ # ;

Firstly. You must ALLOT permanent storage to the strings. In ciforth (my Forth) there is the word $, that does this in the dictionary space.
S" aap" $,
leaves an address with one cell count, followed by characters.
There is no standard word that does similar, you have to write it yourself. This is assuming ALLOCATE is not available.
Using this the following code saves string pointers temporarily to the stack:
0 s" Alpha" $, s" Beta" $, s" Charlie" $,
Then you must store there pointers in an array, hence the sentinel 0, at the expense of an extra auxiliary word:
: ttt BEGIN DUP WHILE , REPEAT DROP ;
And then
( CREATE string-array) ttt
HERE CONSTANT ttt-end
Now you can address strings as follows:
tt-end 2 CELLS - ( #+ TYPE )
You may want add auxiliary words.
This is ugly, cumbersome and far and foremost standard way to do it.

Related

sliced arrays flatten to len=1?

I am pulling my hair out manipulating arrays in bash. I have an array of strings, which contain spaces. I would like an array containing all but the first element of my input array.
input=("first string" "second string" "third string")
echo ${#input[#]}
# len(input)=3
# get slice of all except for first element of input
slice=${input[#]:1}
echo ${#slice[#]}
# expect 2, but get 1
echo $slice
# second string third string
# slice should contain ("second string" "third string"), but instead is "second string third string"
Slicing the array clearly works to eliminate the first element, but the result appears to be a concatenation of all remaining strings, rather than an array. Is there a way to slice an array in bash and get an array as a result?
(sorry, I'm not new to bash, but I've never used it for much before, and I can't find any documentation showing why my slice is flattened)
First off, you should always quote variable expansions. Be very wary of any solution that relies on unquoted expansions. ShellCheck.net is a great tool for catching bugs related to quoting (among many other issues).
To your specific issue, slice=${input[#]:1} does not do what you want. It defines a single scalar variable slice rather than an array, meaning the array expansion (denoted by the [#]) will first be munged into a single string using the current IFS. Here's a demo:
$ arr=(1 2 '3 4')
$ IFS=,
$ var="${arr[#]:1}"
$ echo "$var"
2,3 4
To instead declare and populate an array use the =() notation, like so:
$ var=("${arr[#]:1}")
$ printf '%s\n' "${var[#]}"
2
3 4
Indexes are reset, element 1 is now element 0:
slice=("${input[#]:1}")
Element and index are removed, the first element is now index 1, not index 0:
unset input[0]
${#slice[#]} or ${#input[#]} will now be 1 less than the previous value of ${#input[#]}. Starting out with three elements in slice, the values of "${!slice[#]}" and "${!input[#]}", will be 0 1 and 1 2 respectively (for either the first or second approach)
If you don't quote slice=("${input[#]:1}"), each array element is split on whitespace, creating many more elements.

Using variables with %w or %W - Ruby

I have a similar issue as this post:
How to use variable inside %w{}
but my issue is a bit different. I want to take a string variable and convert it to an array using %w or %W.
text = gets.chomp # get user text string
#e.g I enter "first in first out"
words = %w[#{text}] # convert text into array of strings
puts words.length
puts words
Console output
1
first in first out
Keeps the text as a block of string and doesn't split it into an array words ["first","in", "first", "out"]
words = text.split (" ") # This works fine
words = %w[#{gets.chomp}] # This doesn't work either
words = %w['#{gets.chomp}'] # This doesn't work either
words = %W["#{gets.chomp}"] # This doesn't work either
words = %w("#{gets.chomp}") # This doesn't work either
%w is not intended to do any splitting, it's a way of expressing that the following string in the source should be split. In essence it's just a short-hand notation.
In the case of %W the #{...} chunks are treated as a single token, any spaces contained within are considered an integral part.
The correct thing to do is this:
words = text.trim.split(/\s+/)
Doing things like %W[#{...}] is just as pointless as "#{...}". If you need something cast as a string, call .to_s. If you need something split call split.

For loop to take the value of the whole array each time

Suppose I have 3 arrays, A, B and C
I want to do the following:
A=("1" "2")
B=("3" "4")
C=("5" "6")
for i in $A $B $C; do
echo ${i[0]} ${i[1]}
#process data etc
done
So, basically i takes the value of the whole array each time and I am able to access the specific data stored in each array.
On the 1st loop, i should take the value of the 1st array, A, on the 2nd loop the value of array B etc.
The above code just iterates with i taking the value of the first element of each array, which clearly isn't what I want to achieve.
So the code only outputs 1, 3 and 5.
You can do this in a fully safe and supportable way, but only in bash 4.3 (which adds namevar support), a feature ported over from ksh:
for array_name in A B C; do
declare -n current_array=$array_name
echo "${current_array[0]}" "${current_array[1]}"
done
That said, there's hackery available elsewhere. For instance, you can use eval (allowing a malicious variable name to execute arbitrary code, but otherwise safe):
for array_name in A B C; do
eval 'current_array=( "${'"$array_name"'[#]}"'
echo "${current_array[0]}" "${current_array[1]}"
done
If the elements of the arrays don't contain spaces or wildcard characters, as in your question, you can do:
for i in "${A[*]}" "${B[*]}" "${C[*]}"
do
iarray=($i)
echo ${iarray[0]} ${iarray[1]}
# process data etc
done
"${A[*]}" expands to a single string containing all the elements of ${A[*]}. Then iarray=($i) splits this on whitespace, turning the string back into an array.

Does Lua optimize the ".." operator?

I have to execute the following code:
local filename = dir .. "/" .. base
thousands of times in a loop (it's a recursion that prints a directory tree).
Now, I wonder whether Lua concatenates the 3 strings (dir, "/", base) in one go (i.e., by allocating a string long enough to hold their total lengths) or whether it does this the inefficient way by doing it internally in two steps:
local filename = (dir .. "/") -- step1
.. base -- step2
This last way would be inefficient memory-wise because two strings are allocated instead of just one.
I don't care much about CPU cycles: I care mainly about memory consumption.
Finally, let me generalize the question:
Does Lua allocate only one string, or 4, when it executes the following code?
local result = str1 .. str2 .. str3 .. str4 .. str5
BTW, I know that I could do:
local filename = string.format("%s/%s", dir, base)
But I've yet to benchmark it (memory & CPU wise).
(BTW, I know about table:concat(). This has the added overhead of creating a table so I guess it won't be beneficial in all use cases.)
A bonus question:
In case Lua doesn't optimize the ".." operator, would it be a good idea to define a C function for concatenating strings, e.g. utils.concat(dir, "/", base, ".", extension)?
Although Lua performs a simple optimization on .. usage, you should still be careful to use it in a tight loop, especially when joining very large strings, because this will create lots of garbage and thus impact performance.
The best way to concatenate many strings is with table.concat.
table.concat lets you use a table as a temporary buffer for all the strings to be concatenated and perform the concatenation only when you are done adding strings to the buffer, like in the following silly example:
local buf = {}
for i = 1, 10000 do
buf[#buf+1] = get_a_string_from_somewhere()
end
local final_string = table.concat( buf )
The simple optimization for .. can be seen analyzing the disassembled bytecode of the following script:
-- file "lua_06.lua"
local a = "hello"
local b = "cruel"
local c = "world"
local z = a .. " " .. b .. " " .. c
print(z)
the output of luac -l -p lua_06.lua is the following (for Lua 5.2.2 - edit: the same bytecode is output also in Lua 5.3.6):
main (13 instructions at 003E40A0)
0+ params, 8 slots, 1 upvalue, 4 locals, 5 constants, 0 functions
1 [3] LOADK 0 -1 ; "hello"
2 [4] LOADK 1 -2 ; "cruel"
3 [5] LOADK 2 -3 ; "world"
4 [7] MOVE 3 0
5 [7] LOADK 4 -4 ; " "
6 [7] MOVE 5 1
7 [7] LOADK 6 -4 ; " "
8 [7] MOVE 7 2
9 [7] CONCAT 3 3 7
10 [9] GETTABUP 4 0 -5 ; _ENV "print"
11 [9] MOVE 5 3
12 [9] CALL 4 2 1
13 [9] RETURN 0 1
You can see that only a single CONCAT opcode is generated, although many .. operators are used in the script.
To fully understand when to use table.concat you must know that Lua strings are immutable. This means that whenever you try to concatenate two strings you are indeed creating a new string (unless the resulting string is already interned by the interpreter, but this is usually unlikely). For example, consider the following fragment:
local s = s .. "hello"
and assume that s already contains a huge string (say, 10MB). Executing that statement creates a new string (10MB + 5 characters) and discards the old one. So you have just created a 10MB dead object for the garbage collector to cope with. If you do this repeatedly you end up hogging the garbage collector. This is the real problem with .. and this is the typical use case where it is necessary to collect all the pieces of the final string in a table and to use table.concat on it: this won't avoid the generation of garbage (all the pieces will be garbage after the call to table.concat), but you will greatly reduce unnecessary garbage.
Conclusions
Use .. whenever you concatenate few, possibly short, strings, or you are not in a tight loop. In this case table.concat could give you worse performance because:
you must create a table (which usually you would throw away);
you have to call the function table.concat (the function call overhead impacts performance more than using the built-in .. operator a few times).
Use table.concat, if you need to concatenate many strings, especially if one or more of the following conditions are met:
you must do it in subsequent steps (the .. optimization works only inside the same expression);
you are in a tight loop;
the strings are large (say, several kBs or more).
Note that these are just rules of thumb. Where performance is really paramount you should profile your code.
Anyway Lua is quite fast compared with other scripting languages when dealing with strings, so usually you don't need to care so much.
In your example, whether the .. operator does optimization is hardly a problem for the performance, you don't have to worry about memory or CPU. And there's table.concat for concatenating many strings. (See Programming in Lua) for the use of table.concat.
Back to your question, in this piece of code
local result = str1 .. str2 .. str3 .. str4 .. str5
Lua allocates only one new string, check out this loop from Lua's relevant source in luaV_concat:
do { /* concat all strings */
size_t l = tsvalue(top-i)->len;
memcpy(buffer+tl, svalue(top-i), l * sizeof(char));
tl += l;
} while (--i > 0);
setsvalue2s(L, top-n, luaS_newlstr(L, buffer, tl));
total -= n-1; /* got 'n' strings to create 1 new */
L->top -= n-1; /* popped 'n' strings and pushed one */
You can see that Lua concatenate n strings in this loop but only pushes back to the stack one string in the end, which is the result string.
BTW, I know about table:concat(). This has the added overhead of creating a table so I guess it won't be beneficial in all use cases.
In this particular use case (and similar ones), you could consider reusing a table if you're concerned with creating lots of garbage tables:
local path = {}
...
-- someplace else, in a loop or function:
path[1], path[2] = dir, base
local filename = table.concat(path, "/")
path[1], path[2] = nil
...
you could even generalize this to a "concat" utility:
local rope = {}
function string_concat(...)
for i = 1, select("#", ...) do rope[i] = select(i, ...) end -- prepare rope
local res = table.concat(rope)
for i = 1, select("#", ...) do rope[i] = nil end -- clear rope
return res
end

Access first element of string in Ada

I have a string passed into a function, I would like to compare the first character of the string against a number.
I.E.
if String(1) = "3" then
When I compile I get:
warning: index for String may assume lower bound of 1
warning: suggested replacement String'First + 1
I would really like to make this right, but when I try "first" it actually grabs a number, not the character.
Is there a better way to do it?
I tried looking up the 'First concept, and the below site explains I'm actually getting the number of the index, not the actual contents: http://en.wikibooks.org/wiki/Ada_Programming/Types/array
For example,
Hello_World : constant String := "Hello World!";
World : constant String := Hello_World (7 .. 11);
Empty_String : constant String := "";
Using 'First I'll get:
Array 'First 'Last 'Length 'Range
Hello_World 1 12 12 1 .. 12
World 7 11 5 7 .. 11
Empty_String 1 0 0 1 .. 0
Based on that information, I can't get H from Hello world (for a comparison like if Hello_World(1) = "H" then)
EDIT:
So the way I initially was doing it was
(insert some variable name instead of string in this case)
String(String'First .. String'First) = "1"
So that works from what I can tell, however, rather then writing all that, I found out that
String(String'First) = '1'
Does the same thing but using char comparison, which makes a lot more sense!
Thanks for all the answers everyone!
Strings are the biggest bugaboo for newbie Ada coders; particularly so for those who are already experts at dealing with strings in Cish languages.
Ada strings (in fact all Ada arrays) are not 0 based like C, or 1-based like Fortran. They are based however the coder felt like it. If someone wants to index their string from 10 ... 200, they can. So really the safest way to acces characters in an Ada string is to use the 'first attribute (or better yet, loop through them using 'range or 'first .. 'last).
In your case it looks like you want to get at only the first character in the string. The easiest and safest way to do that for a string named X is X(X'first).
In pactice you would almost never do that though. Instead you would be looping through the string's 'first...'last looking for something, or just using one of the routines in Ada.Strings.Fixed.
The warning is suggesting you use:
String(String'First + Index)
Instead of just
String(Index)
There's something odd about the code in your question. First off, that you're calling your variable "String" and that it's of type "String". Ada will balk at that right off the bat.
And the warning statements you reproduce for that code fragment don't make sense.
Let's say your variable is actually called "Value", i.e.:
Value : String := "34543";
Value(1) is not the same as Value(Value'First + 1), because Value'First (in this declaration) is 1. So you end up referencing Value(1 + 1). You appear to be experiencing this because of mentioning that you can't reference the 'H' in a "Hello World" string.
Now the warning is valid, in that you're safer using 'First (and 'Last and 'Range) to reference array bounds. But you need to use the proper indexing if you're going to offset from the bound retrieved via 'First, typically using either 0-based or 1-based (in which case you need to offset by 1). Use whichever base is more appropriate and readable in your context.

Resources