Why does an array of text lines appear to have an extra level of container? - arrays

I'm reading a file using the "array of lines" mode of Dyalog's ⎕nget:
lines _ _ ← ⎕nget '/usr/share/dict/words' 1
And it appears to work:
lines[1]
10th
But the individual elements don't appear to be character arrays:
line ← lines[1]
line
10th
≢ line
1
⍴ line
Here we see that the first line has a tally of 1 and a shape of the empty array. I can't index into it any further; lines[1][1] or line[1] is a RANK ERROR. If I use ⊂ on the RHS I can assign the value to multiple variables at once and get the same behavior for each variable. But if I do a multiple assignment without the left shoe, I get this:
word rest ← line
word
10th
≢ word
4
⍴ word
4
At last we have the character array I expected! Yet it was not evidently separated from anything else hidden in line; the other variable is identical:
rest
10th
≢ rest
4
⍴ rest
4
word ≡ rest
1
Significantly, when I look at word it has no leading space, unlike line. So it seems that the individual array elements in the content matrix returned by ⎕nget are further wrapped in something that doesn't show up in shape or tally, and can't be indexed into, but when I use a destructuring assignment it unwraps them. It feels rather like the multiple-values stuff in Common Lisp.
If someone could explain what's going on here, I'd appreciate it. I feel like I'm missing something incredibly basic.

The result of reading a file with "array of lines" mode is a nested array. It is specifically a nested vector of character vectors where each character vector is a line from your text file.
For example, take \tmp\test.txt here:
my text file
has 3
lines
If we read this in, we can inspect the contents
(content newline encoding) ← ⎕nget'\tmp\test.txt' 1
≢ content ⍝ How many lines?
3
≢¨content ⍝ How long is each line?
12 5 5
content[2] ⍝ Indexing returns a scalar (non-simple)
┌─────┐
│has 3│
└─────┘
2⊃content ⍝ Use pick to get the contents of the 2nd scalar
has 3
⊃content[2] ⍝ Disclose the non-simple scalar
has 3
As you probably read from the online documentation, the default behaviour of ⎕NGET is to bring in a simple (non-nested) character vector with embedded new line characters. These are typically operating-system dependent.
(content encoding newline) ← ⎕nget'\tmp\test.txt'
newline ⍝ Unicode code points for line endings in this file (Microsoft Windows)
13 10
content
my text file
has 3
lines
content ∊ ⎕ucs 10 13
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1
But with "array of lines" mode, you get a nested result.
For a quick introduction to nested arrays and the array model, see Stefan Kruger's LearnAPL book.

If you turn boxing on it's easier to see what's happening. Each element is an enclosed character vector. Use pick ⊃ instead of bracket index [] to get the actual item.
words ← ⊃⎕nget'/usr/share/dict/words'1
]box on -s=max
⍴words
┌→─────┐
│235886│
└~─────┘
words[10]
┌─────────┐
│ ┌→────┐ │
│ │Aaron│ │
│ └─────┘ │
└∊────────┘
10⊃words ⍝ use pick
┌→────┐
│Aaron│
└─────┘

Related

How to get data from specific input file format skipping line with scanf [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have a .txt file that contains info like this:
9:0
B1 0 0 0 0 0
B2 0 0 0 0 0
B3 4 5 0 0 0
B4 1 2 3 0 0
9:1
B1 0 0 0 0 0
B2 0 0 0 0 0
B3 4 5 0 0 0
B4 1 2 3 0 0
9:2
B1 0 0 0 0 0
B2 0 0 0 0 0
B3 4 5 0 0 0
B4 1 2 3 0 0
(...)
As you can see, the format is a line with the time, and the next four lines is the identifier of "B" followed by five elements of an array.
I can easily read the whole file, looping through the whole file (I know how many lines the file has) using first of all scanf to get the date, next another scanf to read the "B" plus the identifier, and loop four times with scanf again to get the integers of the array and go back to get the time.
This works fine, but (again) as you can see, there's a lot of zeros in the array space, and it would be a lot faster if I check the first element and if it is zero, then skip and read the next box, so I did it and used the break; statement, but the problem is that using break will ruin the structure of my loop, storing in the time variable a B identifier sometimes.
I'm wondering if there's any other way to skip the zeros when I find one, and jump to the next B identifier, i.e. after reading 9:0, i get B1 and then read the first zero, so skip this line and get 'B2' until I find a non-empty array.
If anyone could help me, please!
I think a reasonably fast solution might be to read the entire line into a static char array, skipping to the next line if the 4'th character is 0, and using sscanf to read the values otherwise.
My reasoning here is that the work of fscanf is split between (1) reading in a string from a file, and (2) parsing that string into numbers, and moving them into the provided variables. However, I know IO operations can be pretty slow, so here is another alternative that does slightly less IO.
Use fscanf to read in only the first two tokens of a line (namely, the B_ and the first number), and use fseek to skip to the next line if the first number is 0. I'm not super confident that this will be faster, but its something you could try.

What input operator can we use in C that ignores 'space' and accepts 'Enter' while taking array inputs

Consider an example:
5
1 0 5
1 1 7
1 0 3
2 1 0
2 1 1
Here, in the first line, 5 denotes the size of the array.
I'm entering five sequences one by one.
I want the first sequence ie. 1 0 5 to be stored in arr[0].
Note: 1, 0 and 5 are seperated by spaces.
However, arr[0] should contain 105 without any space.
I want to accept the next sequence into arr[1] only after pressing 'Enter'.
So that arr[1] should contain 117, arr[2] should contain 103 and so on up to arr[4].
Is there any operator that I can use for this?
There are no operators that do I/O in C at all, so no.
I also don't think there's any standard function with those semantics, they tend to view all whitespace as equal.
You should write your own, probably using fgets() to read in whole lines and then extracting the digits to convert to integers.

Accesing two different rows simultaneously in C

Suppose I have a data set arranged in the following way
19 10 1 1
12 15 1 1
13 12 4 5
10 5 2 3
...
and so on, at a particular iteration in a for loop I have to read only the 1st and the 4th row and in the next iteration I have to access some other set of rows,for example
1st iteration:
1st row: 19 10 1 1
4th row: 10 5 2 3
i will access my data using the fscanf() function. But how will i ensure that I choose only the 1st and 4th rows or any two rows for that matter at a given iteration?
(I have not considered reading it into a 2D array since the size of data set is 10^8 )
Thank you.
As you read through your data (say, stored in a standard file), get byte offsets for rows by looking for row delimiters (a newline character). You can then read out rows based on the start and end byte offset with C pointer arithmetic on a FILE * and fseek(). Storing a few byte offsets (an eight byte long or equivalent, often) is cheap.

J and L-systems

I'm going to create a program that can generate strings from L-system grammars.
Astrid Lindenmayer's original L-System for modelling the growth of algae is:
variables : A B
constants : none
axiom : A
rules : (A → AB), (B → A)
which produces:
iteration | resulting model
0 | A
1 | AB
2 | ABA
3 | ABAAB
4 | ABAABABA
5 | ABAABABAABAAB
that is naively implemented by myself in J like this:
algae =: 1&algae : (([: ; (('AB'"0)`('A'"0) #. ('AB' i. ]))&.>"0)^:[) "1 0 1
(i.6) ([;algae)"1 0 1 'A'
┌─┬─────────────┐
│0│A │
├─┼─────────────┤
│1│AB │
├─┼─────────────┤
│2│ABA │
├─┼─────────────┤
│3│ABAAB │
├─┼─────────────┤
│4│ABAABABA │
├─┼─────────────┤
│5│ABAABABAABAAB│
└─┴─────────────┘
Step-by-step illustration:
('AB' i. ]) 'ABAAB' NB. determine indices of productions for each variable
0 1 0 0 1
'AB'"0`('A'"0)#.('AB' i. ])"0 'ABAAB' NB. apply corresponding productions
AB
A
AB
AB
A
'AB'"0`('A'"0)#.('AB' i. ])&.>"0 'ABAAB' NB. the same &.> to avoid filling
┌──┬─┬──┬──┬─┐
│AB│A│AB│AB│A│
└──┴─┴──┴──┴─┘
NB. finally ; and use ^: to iterate
By analogy, here is a result of the 4th iteration of L-system that generates Thue–Morse sequence
4 (([: ; (0 1"0)`(1 0"0)#.(0 1 i. ])&.>"0)^:[) 0
0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0
That is the best that I can do so far. I believe that boxing-unboxing method is insufficient here. This is the first time I've missed linked-lists in J - it's much harder to code grammars without them.
What I'm really thinking about is:
a) constructing a list of gerunds of those functions that build final string (in my examples those functions are constants like 'AB'"0 but in case of tree modeling functions are turtle graphics commands) and evoking (`:6) it,
or something that I am able to code:
b) constructing a string of legal J sentence that build final string and doing (".) it.
But I'm not sure if these programs are efficient.
Can you show me a better approach please?
Any hints as well as comments about a) and b) are highly appreciated!
The following will pad the rectangular array with spaces:
L=: rplc&('A';'AB';'B';'A')
L^:(<6) 'A'
A
AB
ABA
ABAAB
ABAABABA
ABAABABAABAAB
Or if you don't want padding:
L&.>^:(<6) <'A'
┌─┬──┬───┬─────┬────────┬─────────────┐
│A│AB│ABA│ABAAB│ABAABABA│ABAABABAABAAB│
└─┴──┴───┴─────┴────────┴─────────────┘
Obviously you'll want to inspect rplc / stringreplace to see what is happening under the covers.
You can use complex values in the left argument of # to expand an array without boxing.
For this particular L-system, I'd probably skip the gerunds and use a temporary substitution:
to =: 2 : 'n (I.m=y) } y' NB. replace n with m in y
ins =: 2 : '(1 j. m=y) #!.n y' NB. insert n after each m in y
L =: [: 'c'to'A' [: 'A'ins'B' [: 'B'to'c' ]
Then:
L^:(<6) 'A'
A
AB
ABA
ABAAB
ABAABABA
ABAABABAABAAB
Here's a more general approach that simplifies the code by using numbers and a gerund composed of constant functions:
'x'-.~"1 'xAB'{~([:,(0:`(1:,2:)`1:)#.]"0)^:(<6) 1
A
AB
ABA
ABAAB
ABAABABA
ABAABABAABAAB
The AB are filled in at the end for display. There's no boxing here because I use 0 as a null value. These get scattered around quite a bit but the -.~"1 removes them. It does pad all the resulting strings with nulls on the right. If you don't want that, you can use <#-.~"1 to box the results instead:
'x'<#-.~"1 'xAB'{~([:,(0:`(1:,2:)`1:)#.]"0)^:(<6) 1
┌─┬──┬───┬─────┬────────┬─────────────┐
│A│AB│ABA│ABAAB│ABAABABA│ABAABABAABAAB│
└─┴──┴───┴─────┴────────┴─────────────┘

C: how to store integers by reading line by line with a random layout?

I need to read a file and store each number (int) in a variable, when it sees \n or a "-" (the minus sign means that it should store the numbers from 1 to 5 (1-5)) it needs to store it into the next variable. How should I proceed?
I was thinking of using fgets() but I can't find a way to do what I want.
The input looks like this:
0
0
5 10
4
2 4
5-10 2 3 4 6 7-9
4 3
These are x y positions.
I'd use fscanf to read one int at a time, and when it's negative, it is obviously the second part of a range. Or is -4--2 a valid input?

Resources