I'm going to create a program that can generate strings from L-system grammars.
Astrid Lindenmayer's original L-System for modelling the growth of algae is:
variables : A B
constants : none
axiom : A
rules : (A → AB), (B → A)
which produces:
iteration | resulting model
0 | A
1 | AB
2 | ABA
3 | ABAAB
4 | ABAABABA
5 | ABAABABAABAAB
that is naively implemented by myself in J like this:
algae =: 1&algae : (([: ; (('AB'"0)`('A'"0) #. ('AB' i. ]))&.>"0)^:[) "1 0 1
(i.6) ([;algae)"1 0 1 'A'
┌─┬─────────────┐
│0│A │
├─┼─────────────┤
│1│AB │
├─┼─────────────┤
│2│ABA │
├─┼─────────────┤
│3│ABAAB │
├─┼─────────────┤
│4│ABAABABA │
├─┼─────────────┤
│5│ABAABABAABAAB│
└─┴─────────────┘
Step-by-step illustration:
('AB' i. ]) 'ABAAB' NB. determine indices of productions for each variable
0 1 0 0 1
'AB'"0`('A'"0)#.('AB' i. ])"0 'ABAAB' NB. apply corresponding productions
AB
A
AB
AB
A
'AB'"0`('A'"0)#.('AB' i. ])&.>"0 'ABAAB' NB. the same &.> to avoid filling
┌──┬─┬──┬──┬─┐
│AB│A│AB│AB│A│
└──┴─┴──┴──┴─┘
NB. finally ; and use ^: to iterate
By analogy, here is a result of the 4th iteration of L-system that generates Thue–Morse sequence
4 (([: ; (0 1"0)`(1 0"0)#.(0 1 i. ])&.>"0)^:[) 0
0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0
That is the best that I can do so far. I believe that boxing-unboxing method is insufficient here. This is the first time I've missed linked-lists in J - it's much harder to code grammars without them.
What I'm really thinking about is:
a) constructing a list of gerunds of those functions that build final string (in my examples those functions are constants like 'AB'"0 but in case of tree modeling functions are turtle graphics commands) and evoking (`:6) it,
or something that I am able to code:
b) constructing a string of legal J sentence that build final string and doing (".) it.
But I'm not sure if these programs are efficient.
Can you show me a better approach please?
Any hints as well as comments about a) and b) are highly appreciated!
The following will pad the rectangular array with spaces:
L=: rplc&('A';'AB';'B';'A')
L^:(<6) 'A'
A
AB
ABA
ABAAB
ABAABABA
ABAABABAABAAB
Or if you don't want padding:
L&.>^:(<6) <'A'
┌─┬──┬───┬─────┬────────┬─────────────┐
│A│AB│ABA│ABAAB│ABAABABA│ABAABABAABAAB│
└─┴──┴───┴─────┴────────┴─────────────┘
Obviously you'll want to inspect rplc / stringreplace to see what is happening under the covers.
You can use complex values in the left argument of # to expand an array without boxing.
For this particular L-system, I'd probably skip the gerunds and use a temporary substitution:
to =: 2 : 'n (I.m=y) } y' NB. replace n with m in y
ins =: 2 : '(1 j. m=y) #!.n y' NB. insert n after each m in y
L =: [: 'c'to'A' [: 'A'ins'B' [: 'B'to'c' ]
Then:
L^:(<6) 'A'
A
AB
ABA
ABAAB
ABAABABA
ABAABABAABAAB
Here's a more general approach that simplifies the code by using numbers and a gerund composed of constant functions:
'x'-.~"1 'xAB'{~([:,(0:`(1:,2:)`1:)#.]"0)^:(<6) 1
A
AB
ABA
ABAAB
ABAABABA
ABAABABAABAAB
The AB are filled in at the end for display. There's no boxing here because I use 0 as a null value. These get scattered around quite a bit but the -.~"1 removes them. It does pad all the resulting strings with nulls on the right. If you don't want that, you can use <#-.~"1 to box the results instead:
'x'<#-.~"1 'xAB'{~([:,(0:`(1:,2:)`1:)#.]"0)^:(<6) 1
┌─┬──┬───┬─────┬────────┬─────────────┐
│A│AB│ABA│ABAAB│ABAABABA│ABAABABAABAAB│
└─┴──┴───┴─────┴────────┴─────────────┘
Related
What is the best practice for inserting an element into an array at an arbitrary position in J?
I guess this is sort of a double question: my main issue is figuring out how to provide three arguments to the verb I want to create. The gist of the code that I want to write is
insert =. dyad : '(n {. y) , x , (n }. y)'
for a position n. The best solution to this that I can think of is taking a two-length array of boxes as the right argument and the position as the left, but that seems a bit clunky
insert =. dyad : 0
NB. the array to be inserted is the first argument
i =. > {. y
NB. the original array is the second argument
a =. > {: y
(x {. a) , i , (x }. a)
)
EDIT: Furthermore, would it be possible to take an array of indices to insert the item at and an array of items to be inserted at those indices -- i.e. inserting multiple items at a time? It seems to me like this is something J would be good at, but I'm not sure how it would be done.
Boxing the arguments is an often used technique. You can use multiple assignment for cleaner code:
f =: 3 : 0
'arg1 arg2' =: y
)
f (i.5);(i.9) NB. arg1 is i.5, arg2 is i.9
To insert array a at position n in L, you can more compactly write:
n ({., a, }.) L
Another way to insert an element into an array is to fill with #!.. Some examples:
1 1 1j2 1 (#!.999) 1 2 3 4
1 2 3 999 999 4
1j1 1 1j1 1 (#!.999) 1 2 3 4
1 999 2 3 999 4
1 1 0j1 1 (#!.999) 1 2 3 4
1 2 999 4
Depending on your needs, there are many other tricks you can use, like shifting by n n |. and then undoing the shift with dual &.:
a,&. (n |. ]) L
(reply to the comment that got too long)
Both from readability and performance standpoint the two methods are about the same. I would slightly favor the first as more readable but would probably use the second.
You can use timespacex verb to check the performance: eg.
NB. define the different methods
f1 =: 4 :'x ({., a, }.) y
f2 =: 4 :' a,&. (x |. ]) y'
NB. set some parameters
a =: 1000 $ 9
L =: 1e6 $ 5
n =: 333456
NB. check if the too methods give identical results
(n f1 L) -: (n f2 L)
1
NB. iterate 100 times to get performance averages
100 timespacex'n f1 L'
0.00775349 2.09733e7
100 timespacex'n f2 L'
0.00796431 1.67886e7
I am trying to make a 2D array in LC3. So far I am thinking of initializing a block of memory using .BLKW and then loading into that another array into each entry. This doesn't seem like it will lead me on the right track though. Any suggestions?
You can definitely do it with .BLKW and also with .STRINGZ, though the latter is admittedly a bit unusual.
The bigger, usual question is around how you decide to "get" and "put" data into that specific area of memory. There are several ways to do that for sure (no one right answer).
Your initial thoughts are cool and valid, but definitely seem to me to be more complex, especially in LC3.
A more direct "row major" or "column major" form of storage - where successive memory locations represent the next entry in a row (row major), or alternatively the next entry in a column (column major)) - is the standard way to do this.
Basically you want to allocate that area of memory, and then write two functions: one to put an item at location (r, c), and get an item from location (r,c).
For this, you will hopefully only need to put an item that is small enough to fit into on 16bit memory location for LC3. That could be a number, or a character. (bigger than 16 bit is doable but adds more complexity for your program for sure).
If you want a fully roughed out sample, you can find that here: http://lc3tutor.org/#array2Dcolordersmp (or just go to lc3tutor.org and look at the 2D array sample).
If you are wanting to learn and try this on your own, you can just read the description there and ignore the sample code (best if you doing for homework and you want to be sure you learn it). Otherwise, the code there should run fine in the browser-based lc3 simulator you find referenced there.
Good luck!
Jeff
PS Here's the pre-amble to that code, if you want to just work from this... hopefully this example helps anchor the col major approach taken in the full code sample:
.ORIG x3000
BR MAIN; jump over storage below to the start of the main section
.STRINGZ "ABCDEFGHIJKLMNOPQRSTUVWZYZ"; slightly tricky - we are storing a sequence of letters in our 2D array for reference.
; The address of the above string STARTS AT x3001,
; which you will see is the same as the 2D_ARRAY label value below.
; This is essentially our 2D_ARRAY, starting at x3001 and taking up 26 locations,
; plus 1 (for the null terminator on the string).
; We will assume the 2D array has 13 rows and 2 columns.
; Two letters per row and 13 letters per column. 26 letters.
; So our NUM_ROW label will be 13 and our NUM_COL label will be 2. (see labels below)
; We will treat this array as a column-major stored array.
; Based on our string above, that means the cells of the first
; column (column #0 by our conventions) are: A-M.
; And the cells of the second column (column #1) are: N-Z.
; If we were storing the array in row-major form, then the cells of the first ROW
; would be A, B, and the second ROW would be C, D. Etc.
; Like this:
;
;R\C | 0 | 1
; ------------
; 0 | A | N
; 1 | B | O
; 2 | C | P
; 3 | D | Q
; 4 | E | R
; 5 | F | S
; 6 | G | T
; 7 | H | U
; 8 | I | V
; 9 | J | W
; 10 | K | X
; 11 | L | Y
; 12 | M | Z
; such that 2D_ARRAY[ROW=8, COL=1] would be the letter "V"
I have a longitudinal dataset of 18 time periods. For reasons not to be discussed here, this dataset is in the wide shape, not in the long one. More precisely, time-varying variables have an alphabetic prefix which identifies the time it belongs to. For the sake of this question, consider a quantity of interest called pay. This variable is denoted apay in the first period, bpay in the second, and so on, until rpay.
Importantly, different observations have missing values in this variable in different periods, in an unpredictable way. In consequence, running a panel for the full number of periods will reduce my number of observations considerably. Hence, I would like to know precisely how many observations a panel with different lengths will have. To evaluate this, I want to create variables that, for each period and for each number of consecutive periods count how many respondents have the variable with that time sequence. For example, I want the variable b_count_2 to count how many observations have nonmissing pay in the first period and the second. This can be achieved with something like this:
local b_count_2 = 0
if apay != . & bpay != . {
local b_count_2 = `b_count_2' + 1 // update for those with nonmissing pay in both periods
}
Now, since I want to do this automatically, this has to be in a loop. Moreover, there are different numbers of sequences for each period. For example, for the third period, there are two sequences (those with pay in period 2 and 3, and those with sequences in period 1, 2 and 3). Thus, the number of variables to create is 1+2+3+4+...+17 = 153. This variability has to be reflected in the loop. I propose a code below, but there are bits that are wrong, or of which I'm unsure, as highlighted in the comments.
local list b c d e f g h i j k l m n o p q r // periods over which iterate
foreach var of local list { // loop over periods
local counter = 1 // counter to update; reflects sequence length
while `counter' > 0 { // loop over sequence lengths
gen _`var'_counter_`counter' = 0 // generate variable with counter
if `var'pay != . { // HERE IS PROBLEM 1. NEED TO MAKE THIS TO CHECK CONDITIONS WITH INCREASING NUMBER OF ELEMENTS
recode _`var'_counter_`counter' (0 = 1) // IM NOT SURE THIS IS HOW TO UPDATE SPECIFIC OBSERVATIONS.
local counter = `counter' - 1 // update counter to look for a longer sequence in the next iteration
}
}
local counter = `counter' + 1 // HERE IS PROBLEM 2. NEED TO STOP THIS LOOP! Otherwise counter goes to infinity.
}
An example of the result of the above code (if right) is the following. Consider a dataset of five observations, for four periods (denoted a, b, c, and d):
Obs a b c d
1 1 1 . 1
2 1 1 . .
3 . . 1 1
4 . 1 1 .
5 1 1 1 1
where 1 means value is observed in that period, and . is not. The objective of the code is to create 1+2+3=6 new variables such that the new dataset is:
Obs a b c d b_count_2 c_count_2 c_count_3 d_count_2 d_count_3 d_count_4
1 1 1 . 1 1 0 0 0 0 0
2 1 1 . . 1 0 0 0 0 0
3 . . 1 1 0 0 0 1 0 0
4 . 1 1 . 0 1 0 0 0 0
5 1 1 1 1 1 1 1 1 1 1
Now, why is this helpful? Well, because now I can run a set of summarize commands to get a very nice description of the dataset. The code to print this information in one go would be something like this:
local list a b c d e f g h i j k l m n o p q r // periods over which iterate
foreach var of local list { // loop over periods
local list `var'_counter_* // group of sequence variables for each period
foreach var2 of local list { // loop over each element of the list
quietly sum `var'_counter_`var2' if `var'_counter_`var2' == 1 // sum the number of individuals with value = 1 with sequence of length var2 in period var
di as text "Wave `var' has a sequence of length `var2' with " as result r(N) as text " observations." // print result
}
}
For the above example, this produces the following output:
"Wave 'b' has a sequence of length 2 with 3 observations."
"Wave 'c' has a sequence of length 2 with 2 observations."
"Wave 'c' has a sequence of length 3 with 1 observations."
"Wave 'd' has a sequence of length 2 with 2 observations."
"Wave 'd' has a sequence of length 3 with 1 observations."
"Wave 'd' has a sequence of length 4 with 1 observations."
This gives me a nice summary of the trade-offs I'm having between a wider panel and a longer panel.
If you insist on doing this with data in wide form, it is very inefficient to create extra variables just to count patterns of missing values. You can create a single string variable that contains the pattern for each observation. Then, it's just a matter of extracting from this pattern variable what you are looking for (i.e. patterns of consecutive periods up to the current wave). You can then loop over lengths of the matching patterns and do counts. Something like:
* create some fake data
clear
set seed 12341
set obs 10
foreach pre in a b c d e f g {
gen `pre'pay = runiform() if runiform() < .8
}
* build the pattern of missing data
gen pattern = ""
foreach pre in a b c d e f g {
qui replace pattern = pattern + cond(mi(`pre'pay), " ", "`pre'")
}
list
qui foreach pre in b c d e f g {
noi dis "{hline 80}" _n as res "Wave `pre'"
// the longest substring without a space up to the wave
gen temp = regexs(1) if regexm(pattern, "([^ ]+`pre')")
noi tab temp
// loop over the various substring lengths, from 2 to max length
gen len = length(temp)
sum len, meanonly
local n = r(max)
forvalues i = 2/`n' {
count if length(temp) >= `i'
noi dis as txt "length = " as res `i' as txt " obs = " as res r(N)
}
drop temp len
}
If you are open to working in long form, then here is how you would identify spells with contiguous data and how to loop to get the info you want (the data setup is exactly the same as above):
* create some fake data in wide form
clear
set seed 12341
set obs 10
foreach pre in a b c d e f g {
gen `pre'pay = runiform() if runiform() < .8
}
* reshape to long form
gen id = _n
reshape long #pay, i(id) j(wave) string
* identify spells of contiguous periods
egen wavegroup = group(wave), label
tsset id wavegroup
tsspell, cond(pay < .)
drop if mi(pay)
foreach pre in b c d e f g {
dis "{hline 80}" _n as res "Wave `pre'"
sum _seq if wave == "`pre'", meanonly
local n = r(max)
forvalues i = 2/`n' {
qui count if _seq >= `i' & wave == "`pre'"
dis as txt "length = " as res `i' as txt " obs = " as res r(N)
}
}
I echo #Dimitriy V. Masterov in genuine puzzlement that you are using this dataset shape. It can be convenient for some purposes, but for panel or longitudinal data such as you have, working with it in Stata is at best awkward and at worst impracticable.
First, note specifically that
local b_count_2 = 0
if apay != . & bpay != . {
local b_count_2 = `b_count_2' + 1 // update for those with nonmissing pay in both periods
}
will only ever be evaluated in terms of the first observation, i.e. as if you had coded
if apay[1] != . & bpay[1] != .
This is documented here. Even if it is what you want, it is not usually a pattern for others to follow.
Second, and more generally, I haven't tried to understand all of the details of your code, as what I see is the creation of a vast number of variables even for tiny datasets as in your sketch. For a series T periods long, you would create a triangular number [(T - 1)T]/2 of new variables; in your example (17 x 18)/2 = 153. If someone had series 100 periods long, they would need 4950 new variables.
Note that because of the first point just made, these new variables would pertain with your strategy only to individual variables like pay and individual panels. Presumably that limitation to individual panels could be fixed, but the main idea seems singularly ill-advised in many ways. In a nutshell, what strategy do you have to work with these hundreds or thousands of new variables except writing yet more nested loops?
Your main need seems to be to identify spells of non-missing and missing values. There is easy machinery for this long since developed. General principles are discussed in this paper and an implementation is downloadable from SSC as tsspell.
On Statalist, people are asked to provide workable examples with data as well as code. See this FAQ That's entirely equivalent to long-standing requests here for MCVE.
Despite all that advice, I would start by looking at the Stata command xtdescribe and associated xt tools already available to you. These tools do require a long data shape, which reshape will provide for you.
Let me add another answer based on the example now added to the question.
Obs a b c d
1 1 1 . 1
2 1 1 . .
3 . . 1 1
4 . 1 1 .
5 1 1 1 1
The aim of this answer is not to provide what the OP asks but to indicate how many simple tools are available to look at patterns of non-missing and missing values, none of which entail the creation of large numbers of extra variables or writing intricate code based on nested loops for every new question. Most of those tools require a reshape long.
. clear
. input a b c d
a b c d
1. 1 1 . 1
2. 1 1 . .
3. . . 1 1
4. . 1 1 .
5. 1 1 1 1
6. end
. rename (a b c d) (y1 y2 y3 y4)
. gen id = _n
. reshape long y, i(id) j(time)
(note: j = 1 2 3 4)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 5 -> 20
Number of variables 5 -> 3
j variable (4 values) -> time
xij variables:
y1 y2 ... y4 -> y
-----------------------------------------------------------------------------
. xtset id time
panel variable: id (strongly balanced)
time variable: time, 1 to 4
delta: 1 unit
. preserve
. drop if missing(y)
(7 observations deleted)
. xtdescribe
id: 1, 2, ..., 5 n = 5
time: 1, 2, ..., 4 T = 4
Delta(time) = 1 unit
Span(time) = 4 periods
(id*time uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
2 2 2 2 3 4 4
Freq. Percent Cum. | Pattern
---------------------------+---------
1 20.00 20.00 | ..11
1 20.00 40.00 | .11.
1 20.00 60.00 | 11..
1 20.00 80.00 | 11.1
1 20.00 100.00 | 1111
---------------------------+---------
5 100.00 | XXXX
* ssc inst xtpatternvar
. xtpatternvar, gen(pattern)
* ssc inst groups
. groups pattern
+------------------------------------+
| pattern Freq. Percent % <= |
|------------------------------------|
| ..11 2 15.38 15.38 |
| .11. 2 15.38 30.77 |
| 11.. 2 15.38 46.15 |
| 11.1 3 23.08 69.23 |
| 1111 4 30.77 100.00 |
+------------------------------------+
. restore
. egen npresent = total(missing(y)), by(time)
. tabdisp time, c(npresent)
----------------------
time | npresent
----------+-----------
1 | 2
2 | 1
3 | 2
4 | 2
----------------------
I want to write prime function for purposes of learning J.
So far I've come up with this:
=&0+/(=&0)(2+i.(-&2)y)|y
It's working great except that I should store number in y variable.
y=.5
=&0+/(=&0)(2+i.(-&2)y)|y NB. prime cheker
1
y=.13
=&0+/(=&0)(2+i.(-&2)y)|y NB. prime cheker
1
y=.14
=&0+/(=&0)(2+i.(-&2)y)|y NB. prime cheker
0
How do I write a function that works what takes argument? i.e. f 13 -> 1
You can just define a verb using : 3.
f =: 3 :'=&0+/(=&0)(2+i.(-&2)y)|y'
f 5
1
f 13
1
f 10
0
When using : 3, y always refers to the right hand argument of the verb.
If you want to define a dyadic verb, use : 4 and x for the left argument.
Btw, you can set the value of a variable anywhere:
=&0+/(=&0)(2+i.(-&2)y)|y=.5
1
=&0+/(=&0)(2+i.(-&2)y)|y=.10
0
You might find the Defining Verbs Guide on the J Wiki useful.
As has already been mentioned you can take your sentence and define it as a verb using the following syntax:
isPrime0=: 3 : '=&0+/(=&0)(2+i.(-&2)y)|y'
However it is probably more natural to write it like this:
isPrime1=: 3 : '0 = (+/ 0 = (2 + i. y - 2) | y)'
You could also define a tacit version (doesn't refer to the arguments) like any of the following:
isPrime2=: 0 = [: +/ 0 = ] |~ 2 + [: i. 2 -~ ]
isPrime3=: 0 = [: +/ 0 = ] |~ 2 + i.#:-&2 NB. replace train with verb composed using conjunctions
isPrime4=: 0 = [: +/ 0 = ] |~ i.&.(-&2) NB. use Under to re-add the 2 after Integers
isPrime5=: 0 -.#e. i.&.(-&2) | ] NB. check no zero in result
I have written a linear solver employing Householder reflections/transformations in ANSI C which solves Ax=b given A and b. I want to use it to find the eigenvector associated with an eigenvalue, like this:
(A-lambda*I)x = 0
The problem is that the 0 vector is always the solution that I get (before someone says it, yes I have the correct eigenvalue with 100% certainty).
Here's an example which pretty accurately illustrates the issue:
Given A-lambda*I (example just happens to be Hermitian):
1 2 0 | 0
2 1 4 | 0
0 4 1 | 0
Householder reflections/transformation will yield something like this
# # # | 0
0 # # | 0
0 0 # | 0
Back substitution will find that solution is {0,0,0}, obviously.
It's been a while since I've written an eigensolver, but I seem to recall that the trick was to refactor it from (A - lambda*I) * x = 0 to A*x = lambda*x. Then your Householder or Givens steps will give you something like:
# # # | #
0 # # | #
0 0 1 | 1
...from which you can back substitute without reaching the degenerate 0 vector. Usually you'll want to deliver x in normalized form as well.
My memory is quite rusty here, so I'd recommend checking Golub & Van Loan for the definitive answer. There are quite a few tricks involved in getting this to work robustly, particularly for the non-symmetric case.
This is basically the same answer as #Drew, but explained a bit differently.
If A is the matrix
1 2 0
2 1 4
0 4 1
then the eigenvalues are lambda = 1, 1+sqrt(20), 1-sqrt(20). Let us take for simplicity lambda = 1. Then the augmented matrix for the system (A - lambda*I) * x = 0 is
0 2 0 | 0
2 0 4 | 0
0 4 0 | 0
Now you do the Householder / Givens to reduce it to upper triangular form. As you say, you get something of the form
# # # | 0
0 # # | 0
0 0 # | 0
However, the last # should be zero (or almost zero). Exactly what you get depends on the details of the transformations you do, but if I do it by hand I get
2 0 4 | 0
0 2 0 | 0
0 0 0 | 0
Now you do backsubstitution. In the first step, you solve the equation in the last row. However, this equation does not yield any information, so you can set x[2] (the last element of the vector x) to any value you want. If you set it to zero and continue the back-substitution with that value, you get the zero vector. If you set it to one (or any nonzero value), you get a nonzero vector. The idea behind Drew's answer is to replace the last row with 0 0 1 | 1 which sets x[2] to 1.
Round-off error means that the last #, which should be zero, is probably not quite zero but some small value like 1e-16. This can be ignored: just take it as zero and set x[2] to one.
Obligatory warning: I assume you are implementing this for fun or educational purposes. If you need to find eigenvectors in serious code, you are better off using code written by others as this stuff is tricky to get right.