How do you calculate the number of elements in a jagged array in F#? - arrays

I am new to F# and haven't found the answer to this anywhere. I am creating a jagged array that can hold 10 rows and 10 columns each with an increasing number of elements. The code I used for the array creation and printing is as follows:
let jagged = [| for a in 1 .. 10 do yield [| for a in 1 .. a do yield 0 |] |]
let mutable len = 0;
for arr in jagged do
for col in arr do
len <- (len + 1)
printf "%i " col
printfn "";
printfn "%i" len
The above code gives the following output
0
0 0
0 0 0
0 0 0 0
0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
55
Currently, I am calculating the number of elements manually but would like to know if there is a better way to do so.

If you want to calculate the length of a single array, you could use Array.length. But what you have is an array of arrays of different lengths, and you want to calculate the sum of their sizes. Rather than just give you the answer, I'll show you how you could use https://fsharpforfunandprofit.com/posts/list-module-functions/ (a site by Scott Wlaschin that's a really terrific resource, BTW) to find the answer yourself. This page presents a series of questions to help you find the functions you're looking for: starting from question 1, you move to other questions and eventually to a list of useful functions.
Question 1 on that page is, "What kind of collection do you have?" The choices are "I don't have a collection and I want to create one", or "I have one collection I want to work with", or several other choices where you have two or three or more collections. Here, we have one collection we want to work with, so the page directs us to question 9.
Question 9 on that page has a bunch of choices I won't repeat here, but one of them is "If you want to aggregate or summarize the collection into a single value". That sounds like what we want: we want the sum of the lengths of the sub-arrays. So we go to section 14, which has a bunch of functions we could use. And halfway down the list is sum and sumBy. Those sound intriguing. The sum function "returns the sum of the elements in the collection"... well, no, that won't work, because our array contains arrays, not numbers. But the sumBy function "returns the sum of the results generated by applying the function to each element of the collection." And we know there's a function for finding the length of a single array: Array.length. (The page talks about functions that work on lists, but pretty much any function that works on lists has a corresponding function that works on arrays and a similar corresponding function that works on sequences. The few exceptions are for things like how you can have infinite sequences, but not infinite arrays or lists, so there's a Seq.initInfinite function but there's no Array.initInfinite or List.initInfinite function).
So now that we've found that, we just need to write it.
let lengthOfJaggedArray arr = arr |> Array.sumBy Array.length
And that's it. Instead of calculating the length by hand via two nested for loops, there's a one-line solution that's quite simple and uses built-in functions. All you needed to do was know what functions are available — and since the entire list of available array/list/seq functions can be a little daunting when you're new to F#, Scott Wlaschin has made a very useful resource to help make it a bit less daunting.

Related

Creating a series of 2-dimensional arrays from a text file in Julia

I'm trying to write a Sudoku solver, which is the fun part. The un-fun part is actually loading the puzzles into Julia from a text file. The text file consists of a series of puzzles comprising a label line followed by 9 lines of digits (0s being used to denote blank squares). The following is a simple example of the sort of text file I am using (sudokus.txt):
Easy 7
000009001
008405670
940000032
034061800
070050020
002940360
890000056
061502700
400700000
Medium 95
000300100
800016070
000009634
001070000
760000015
000020300
592400000
030860002
007002000
Hard 143
000003700
305061000
000200004
067002100
400000003
003900580
200008000
000490308
008100000
What I want to do is strip out the label lines and store the 9x9 grids in an array. File input operations are not my specialist subject, and I've tried various methods such as read(), readcsv(), readlines() and readline(). I don't know whether there is any advantage to storing the digits as characters rather than integers, but leading zeros have to be maintained (a problem I have encountered with some input methods and with abortive attempts to use parse()).
I've come up with a solution, but I suspect it's far from optimal:
function main()
open("Text Files\\sudokus.txt") do file
grids = Vector{Matrix{Int}}()
grid = Matrix{Int}(0,9)
row_no = 0
for line in eachline(file)
if !(all(i -> isnumber(i), line))
continue
else
row_no += 1
squares = split(line, "")
row = transpose([parse(Int, square) for square in squares])
grid = vcat(grid, row)
if row_no == 9
push!(grids, grid)
grid = Matrix{Int}(0,9)
row_no = 0
end
end
end
return grids
end
end
#time main()
I initially ran into #code_warntype problems from the closure, but I seem to have solved those by moving my grids, grid and row_no variables from the main() function to the open block.
Can anyone come up with a more efficient way to achieve my objective or improve my code? Is it possible, for example, to load 10 lines at a time from the text file? I am using Julia 0.6, but solutions using 0.7 or 1.0 will also be useful going forward.
I believe your file is well-structured, by that I mean each 1,11,21... contains difficulty information and the lines between them contains the sudoku rows. Therefore if we know the number of lines then we know the number of sudokus in the file. The code utilizes this information to pre-allocate an array of exactly the size needed.
If your file is too-big then you can play with eachline instead of readlines. readlines read all the lines of the file into the RAM while eachline creates an iterable to read lines one-by-one.
function readsudoku(file_name)
lines = readlines(file_name)
sudokus = Array{Int}(undef, 9, 9, div(length(lines),10)) # the last dimension is for each sudoku
for i in 1:length(lines)
if i % 10 != 1 # if i % 10 == 1 you have difficulty line
sudokus[(i - 1) % 10, : , div(i-1, 10) + 1] .= parse.(Int, collect(lines[i])) # collect is used to create an array of `Char`s
end
end
return sudokus
end
This should run on 1.0 and 0.7 but I do not know if it runs on 0.6. Probably, you should remove undef argument in Array allocation to make it run on 0.6.
Similar to Hckr's (faster) approach, my first idea is:
s = readlines("sudoku.txt")
smat = reshape(s, 10,3)
sudokus = Dict{String, Matrix{Int}}()
for k in 1:3
sudokus[smat[1,k]] = parse.(Int, permutedims(hcat(collect.(Char, smat[2:end, k])...), (2,1)))
end
which produces
julia> sudokus
Dict{String,Array{Int64,2}} with 3 entries:
"Hard 143" => [0 0 … 0 0; 3 0 … 0 0; … ; 0 0 … 0 8; 0 0 … 0 0]
"Medium 95" => [0 0 … 0 0; 8 0 … 7 0; … ; 0 3 … 0 2; 0 0 … 0 0]
"Easy 7" => [0 0 … 0 1; 0 0 … 7 0; … ; 0 6 … 0 0; 4 0 … 0 0]

Matlab One Hot Encoding - convert column with categoricals into several columns of logicals

CONTEXT
I have a large number of columns with categoricals, all with different, unrankable choices. To make my life easier for analysis, I'd like to take each of them and convert it to several columns with logicals. For example:
1 GENRE
2 Pop
3 Classical
4 Jazz
...would turn into...
1 Pop Classical Jazz
2 1 0 0
3 0 1 0
4 0 0 1
PROBLEM
I've tried using ind2vec but this only works with numericals or logicals. I've also come across this but am not sure it works with categoricals. What is the right function to use in this case?
If you want to convert from a categorical vector to a logical array, you can use the unique function to generate column indices, then perform your encoding using any of the options from this related question:
% Sample data:
data = categorical({'Pop'; 'Classical'; 'Jazz'; 'Pop'; 'Pop'; 'Jazz'});
% Get unique categories and create indices:
[genre, ~, index] = unique(data)
genre =
Classical
Jazz
Pop
index =
3
1
2
3
3
2
% Create logical matrix:
mat = logical(accumarray([(1:numel(index)).' index], 1))
mat =
6×3 logical array
0 0 1
1 0 0
0 1 0
0 0 1
0 0 1
0 1 0
ind2vec do work with the cell strings, and you could call cellstr function to get such a cell string.
This codes may help (From this ,I only changed a little)
data = categorical({'Pop'; 'Classical'; 'Jazz';});
GENRE = cellstr(data); %change categorical data into cell strings
[~, loc] = ismember(GENRE, unique(GENRE));
genre = ind2vec(loc')';
Gen=full(genre);
array2table(Gen, 'VariableNames', unique(GENRE))
run such a code will return this:
ans =
Classical Jazz Pop
_________ ____ ___
0 0 1
1 0 0
0 1 0
you can call unique(GENRE) to check the categories(in cell strings). In the meanwhile, logical(Gen)(or call logical(full(genre))) contain columns with logical that you need.
P.s. categorical structure might be faster than cell string, but ind2vec function doesn't work with it. unique and accumarray might better.

Generating a matrix to describe a two-dimensional feature

Let's say I have a vector A = [-1,2];
Each element in A is described by the actual number and sign. So each element has a 2 dimensional feature-set.
I would like to generate a matrix, in this case 2x2 where the columns correspond to the element, and rows correspond to the presence of a feature. The presence of a feature is described by 1's and 0's. So, if an element is positive, it is 1, if the element is the number 1, then the result is 1 as well. In the case above I would get:
Element 1 Element 2
Is this a 1? 1 0
Is this a positive number? 0 1
What is the smartest way to go about accomplishing this? Obviously if statements would work, but I feel that there should be a faster, much smarter way of going about this. I am coding this in matlab by the way, and I would appreciate any help.
#Benoit_11's solution is a fine one. Here's a similar but maybe simpler solution. You could try both and see which is faster if you care about speed.
features = [abs(A) == 1; A > 0];
this assumes A is a row vector in order to get the output in the format you specified.
Simple way using ismember for the first condition and logical operation for the 2nd condition. ismember outputs a logical array which you can plug into the output you need (here called DescribeA; and likewise when you check for values greater than 0 using the > operator.
%// Test array
A = [-1,2,1,-10,5,-3,1]
%// Initialize output
DescribeA = zeros(2,numel(A));
%// 1st condition. Check if values are 1 or -1
DescribeA(1,:) = ismember(A,1)|ismember(A,-1);
%// Check if they are > 0
DescribeA(2,:) = A>0;
Output in Command Window:
A =
-1 2 1 -10 5 -3 1
DescribeA =
1 0 1 0 0 0 1
0 1 1 0 1 0 1
I feel there is a smarter way for the 1st condition but I can't seem to find it.

find largest rectangle not (necessary) aligned with image boundary in binary matrix

I am using this solution to find rectangles aligned with the image border in a binary matrix. Suppose now I want to find a rectangle that is not aligned with the image border, and I don't know its orientation; what would be the fastest way to find it?
For the sake of the example, let's look for a rectangle containing only 1's. For example:
1 1 1 1 0 0 0 0 0 1 0 0 1 1 1
0 1 1 1 1 1 0 0 0 1 0 0 1 1 0
0 0 0 1 1 1 1 1 0 1 0 0 1 0 0
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 1 1 1 1 1 0
Then the algorithm described in the solution I described above would only find a rectangle of size 6 (3x2). I would like to find a bigger rectangle that is tilted; we can clearly see a rectanble of at least size 10 or more...
I am working in C/C++ but an algorithm description in any language or pseudo-code would help me a lot.
Some more details:
there can be more than one rectangle in the image: I need the biggest only
the rectangle is not a beautiful rectangle in the image (I adapted my example above a little bit)
I work on large images (1280x1024) so I'm looking for the fastest solution (a brute-force O(n³) algorithm will be very slow)
(optional) if the solution can be parallellized, that is a plus (then I can boost it more using GPU, SIMD, ...)
I only have a partial answer for this question, and only a few thoughts on complexity or speed for what I propose.
Brute Force
The first idea that I see is to use the fact that your problem is discrete to implement a rotation around the center of the image and repeat the algorithm you already use in order to find the axis aligned solution.
This has the downside of checking a whole lot of candidate rotations. However, this check can be done in parallel since they are indepedant of one another. This is still probably very slow, although implementing it (shouldn't be too hard) and would provide a more definite answer to the question speed once parallelized.
Note that your work-space being a discrete matrix, there is only a finite number of rotation to browse through.
Other Approach
The second solution I see is:
To cut down your base matrix so as to separate the connected components [1] (corresponding to the value set you're interested in).
For each one of those smaller matrices -- note that they may be overlapping depending on the distribution -- find the minimum oriented bounding box for the value set you're interested in.
Still for each one of those, rotate your matrix so that the minimum oriented bounding box is now axis-aligned.
Launch the algorithm you already have to find the maximum axis-aligned rectangle containing only values from your value set.
The solution found by this algorithm would be the largest rectangle obtained from all the connected components.
This second solution would probably give you an approximation of the soluiton, but I believe it might prove to be worth trying.
For reference
The only solutions that I have found for the problem of the maximum/largest empty rectangle are axis-aligned. I have seen many unanswered questions corresponding to the oriented version of this problem on 2D continuous space.
EDIT:
[1] Since what we want is to separate the connected component, if there is a degree of overlap, you should do as in the following example:
0 1 0 0
0 1 0 1
0 0 0 1
should be divided into:
0 0 0 0
0 0 0 1
0 0 0 1
and
0 1 0 0
0 1 0 0
0 0 0 0
Note that I kept the original dimensions of the matrix. I did that because I'm guessing from your post it has some importance and that a rectangle expanding further away from the boundaries would not be found as a solution (i.e. that we can't just assume there are zero values beyond the border).
EDIT #2:
The choice of whether or not to keep the matrix dimensions is debatable since it will not directly influence the algorithm.
However, it is worth noting that if the matrices corresponding to connected components do not overlap on non-zero values, you may choose to store those matrices "in-place".
You also need to consider the fact that if you wish to return as output the coordinates of the rectangle, creating a matrix with different dimensions for each connected component, this will force you to store the coordinates of your newly created matrix in the original one (actually, one point, say for instance the up-left one, should be enough).

Create an "array style" countdown in c

What I want to know is, if it's possible to create a countdown in c, BUT have a condition for when it hits an "unsual" piece of data in the array. I'll explain better with examples.
This is also similar to: Read ahead in an array to predict later outcomes in C
however, it was poorly worded. So, I am rewording this question.
Ex: The array is an integer array with : 0 0 0 0 1 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 2
So, when its zero, don't do anything.
However, when it's non-zero, display the text associated with the number (according to some condition).
With pseudocode it'd be something like this:
if 0 dont do anything ====> within this countdown till next non-zero
if != 0 then display text asociated
reset countdown till next non-zero.
Is there any way this can be achieved? Basically, this would mean you could predict or read ahead in the array. Any help would really be appreciated!

Resources