Haskell file reading and finding values - file

I have recently started learning Haskell and I'm having a hard time figuring out how to interpret text files.
I have following .txt file:
ncols 5
nrows 5
xllcorner 809970
yllcorner 169790
cellsize 20
NODATA_value -9999
9 0 0 0 0
0 1 0 0 0
0 0 0 0 0
0 0 0 0 0
0 2 0 0 3
The first 6 lines just display some information I need when working with the file in a GIS software. The real deal starts when I try to work with the numbers below in Haskell.
I want to tell Haskell to look up where the numbers 9, 1, 2 and 3 are and print back the number of the row and column where those numbers actually are. In this case Haskell should print:
The value 9 is in row 1 and column 1
The value 1 is in row 2 and column 2
The value 2 is in row 5 and column 2
The value 3 is in row 5 and column 5
I tried finding the solution (or at least similar methods for interpreting files) in tutorials and other Haskell scripts without any success, so any help would be greatly appreciated.

Here is an example of a script to do what you want. Note that this will in its current form does not fail gracefully (but given this is a script, I doubt this is a concern). Make sure there is a trailing newline at the end of your file!
import Control.Monad (replicateM, when)
import Data.Traversable (for)
import System.Environment (getArgs)
main = do
-- numbers we are looking for
numbers <- getArgs
-- get the key-value metadata
metadata <- replicateM 6 $ do
[key,value] <- words <$> getLine
return (key,value)
let Just rows = read <$> lookup "nrows" metadata
Just cols = read <$> lookup "ncols" metadata
-- loop over all the entries
for [1..rows] $ \row ->do
rawRow <- words <$> getLine
for (zip [1..cols] rawRow) $ \(col,cell) ->
when (cell `elem` numbers)
(putStrLn ("The value " ++ cell ++ " is in row " ++ show row ++ " and column " ++ show col))
To use it, pass it as command line arguments the numbers you are looking for and then feed as input your data file.
$ ghc script.hs
$ ./script 9 1 2 3 < data.txt
Let me know if you have any questions!
I wasn't really sure if you wanted to look up just a fixed set of numbers, or any non-zero number. As your question asked for the former, that is what I did.

Related

Creating a series of 2-dimensional arrays from a text file in Julia

I'm trying to write a Sudoku solver, which is the fun part. The un-fun part is actually loading the puzzles into Julia from a text file. The text file consists of a series of puzzles comprising a label line followed by 9 lines of digits (0s being used to denote blank squares). The following is a simple example of the sort of text file I am using (sudokus.txt):
Easy 7
000009001
008405670
940000032
034061800
070050020
002940360
890000056
061502700
400700000
Medium 95
000300100
800016070
000009634
001070000
760000015
000020300
592400000
030860002
007002000
Hard 143
000003700
305061000
000200004
067002100
400000003
003900580
200008000
000490308
008100000
What I want to do is strip out the label lines and store the 9x9 grids in an array. File input operations are not my specialist subject, and I've tried various methods such as read(), readcsv(), readlines() and readline(). I don't know whether there is any advantage to storing the digits as characters rather than integers, but leading zeros have to be maintained (a problem I have encountered with some input methods and with abortive attempts to use parse()).
I've come up with a solution, but I suspect it's far from optimal:
function main()
open("Text Files\\sudokus.txt") do file
grids = Vector{Matrix{Int}}()
grid = Matrix{Int}(0,9)
row_no = 0
for line in eachline(file)
if !(all(i -> isnumber(i), line))
continue
else
row_no += 1
squares = split(line, "")
row = transpose([parse(Int, square) for square in squares])
grid = vcat(grid, row)
if row_no == 9
push!(grids, grid)
grid = Matrix{Int}(0,9)
row_no = 0
end
end
end
return grids
end
end
#time main()
I initially ran into #code_warntype problems from the closure, but I seem to have solved those by moving my grids, grid and row_no variables from the main() function to the open block.
Can anyone come up with a more efficient way to achieve my objective or improve my code? Is it possible, for example, to load 10 lines at a time from the text file? I am using Julia 0.6, but solutions using 0.7 or 1.0 will also be useful going forward.
I believe your file is well-structured, by that I mean each 1,11,21... contains difficulty information and the lines between them contains the sudoku rows. Therefore if we know the number of lines then we know the number of sudokus in the file. The code utilizes this information to pre-allocate an array of exactly the size needed.
If your file is too-big then you can play with eachline instead of readlines. readlines read all the lines of the file into the RAM while eachline creates an iterable to read lines one-by-one.
function readsudoku(file_name)
lines = readlines(file_name)
sudokus = Array{Int}(undef, 9, 9, div(length(lines),10)) # the last dimension is for each sudoku
for i in 1:length(lines)
if i % 10 != 1 # if i % 10 == 1 you have difficulty line
sudokus[(i - 1) % 10, : , div(i-1, 10) + 1] .= parse.(Int, collect(lines[i])) # collect is used to create an array of `Char`s
end
end
return sudokus
end
This should run on 1.0 and 0.7 but I do not know if it runs on 0.6. Probably, you should remove undef argument in Array allocation to make it run on 0.6.
Similar to Hckr's (faster) approach, my first idea is:
s = readlines("sudoku.txt")
smat = reshape(s, 10,3)
sudokus = Dict{String, Matrix{Int}}()
for k in 1:3
sudokus[smat[1,k]] = parse.(Int, permutedims(hcat(collect.(Char, smat[2:end, k])...), (2,1)))
end
which produces
julia> sudokus
Dict{String,Array{Int64,2}} with 3 entries:
"Hard 143" => [0 0 … 0 0; 3 0 … 0 0; … ; 0 0 … 0 8; 0 0 … 0 0]
"Medium 95" => [0 0 … 0 0; 8 0 … 7 0; … ; 0 3 … 0 2; 0 0 … 0 0]
"Easy 7" => [0 0 … 0 1; 0 0 … 7 0; … ; 0 6 … 0 0; 4 0 … 0 0]

Matlab One Hot Encoding - convert column with categoricals into several columns of logicals

CONTEXT
I have a large number of columns with categoricals, all with different, unrankable choices. To make my life easier for analysis, I'd like to take each of them and convert it to several columns with logicals. For example:
1 GENRE
2 Pop
3 Classical
4 Jazz
...would turn into...
1 Pop Classical Jazz
2 1 0 0
3 0 1 0
4 0 0 1
PROBLEM
I've tried using ind2vec but this only works with numericals or logicals. I've also come across this but am not sure it works with categoricals. What is the right function to use in this case?
If you want to convert from a categorical vector to a logical array, you can use the unique function to generate column indices, then perform your encoding using any of the options from this related question:
% Sample data:
data = categorical({'Pop'; 'Classical'; 'Jazz'; 'Pop'; 'Pop'; 'Jazz'});
% Get unique categories and create indices:
[genre, ~, index] = unique(data)
genre =
Classical
Jazz
Pop
index =
3
1
2
3
3
2
% Create logical matrix:
mat = logical(accumarray([(1:numel(index)).' index], 1))
mat =
6×3 logical array
0 0 1
1 0 0
0 1 0
0 0 1
0 0 1
0 1 0
ind2vec do work with the cell strings, and you could call cellstr function to get such a cell string.
This codes may help (From this ,I only changed a little)
data = categorical({'Pop'; 'Classical'; 'Jazz';});
GENRE = cellstr(data); %change categorical data into cell strings
[~, loc] = ismember(GENRE, unique(GENRE));
genre = ind2vec(loc')';
Gen=full(genre);
array2table(Gen, 'VariableNames', unique(GENRE))
run such a code will return this:
ans =
Classical Jazz Pop
_________ ____ ___
0 0 1
1 0 0
0 1 0
you can call unique(GENRE) to check the categories(in cell strings). In the meanwhile, logical(Gen)(or call logical(full(genre))) contain columns with logical that you need.
P.s. categorical structure might be faster than cell string, but ind2vec function doesn't work with it. unique and accumarray might better.

Turning .txt files into matrices in python

So, I have .txt files with matrices in this form:
5 8
9 -1
0 2
file2:
9 2 -1
7 0 9
-1 7 0
and so on, the dimensions are not set in stone.
What I'm trying to do is to create a function that translates the text documents into matrices and later no is capable of doing basic math with said matrices, namely multiplication.
My code is already capable of opening the matrices and reading them to some extent like so:
Matrix = {}
file = open(filename, 'r')
for line in file:
width = len(line.split(" "))
height += 1
matrixname = input("enter matrix name")
for line in file:
Matrix[matrixname] = [[0 for x in range(height)] for y in range(width)]
but I'm not sure if I'm going into the right direction or what I'm able to do now.
Edit*
juanpa's suggested code piece
[[int(i) for i in line.split()] for line in tiedosto]
does the job of handling each index value one by one and I now have a function that will create a matrix of zeros that is the same dimension as the original matrix, now what I lack is a way to put each index value into their correspondent matrix index of the matrix of zeros.

Calculating mean over an array of lists in R

I have an array built to accept the outputs of a modelling package:
M <- array(list(NULL), c(trials,3))
Where trials is a number that will generate circa 50 sets of data.
From a sampling loop, I am inserting a specific aspect of the outputs. The output from the modelling package looks a little like this:
Mt$effects
c_name effect Other
1 DPC_I 0.0818277549 0
2 DPR_I 0.0150814475 0
3 DPA_I 0.0405341027 0
4 DR_I 0.1255416311 0
5 (etc.)
And I am inserting it into my array via a loop
For(x in 1:trials) {
Mt<-run_model(params)
M[[x,3]] <- Mt$effects
}
The object now looks as follows
M[,3]
[[1]]
c_name effect Other
1 DPC_I 0.0818277549 0
2 DPR_I 0.0150814475 0
3 DPA_I 0.0405341027 0
4 DR_I 0.1255416311 0
5 (etc.)
[[2]]
c_name effect Other
1 DPC_I 0.0717384637 0
2 DPR_I 0.0190812375 0
3 DPA_I 0.0856456427 0
4 DR_I 0.2330002551 0
5 (etc.)
[[3]]
And so on (up to 50 elements).
What I want to do is calculate an average (and sd) of effect, grouped by each c_name, across each of these 50 trial runs, but I’m unable to extract the data in to a single dataframe (for example) so that I can run a ddply summarise across them.
I have tried various combinations of rbind, cbind, unlist, but I just can’t understand how to correctly lift this data out of the sequential elements. I note also that any reference to .names results in NULL.
Any solution would be most appreciated!

Look at each row separately in a matrix (Matlab)

I have a matrix in Matlab(2012) with 3 columns and X number of rows, X is defined by the user, so varies each time. For this example though I will use a fixed 5x3 matrix.
So I would like to perform an iterative function on each row within the matrix, while the value in the third column is below a certain value. Then store the new values within the same matrix, so overwrite the original values.
The code below is a simplified version of the problem.
M=[-2 -5 -3 -2 -4]; %Vector containing random values
Vf_X=M+1; %Defining the first column of the matrix
Vf_Y=M+2; %Defining the secound column of the matrix
Vf_Z=M; %Defining the third column of the matrix
Vf=[Vf_X',Vf_Y',Vf_Z']; %Creating the matrix
while Vf(:,3)<0
Vf=Vf+1;
end
disp(Vf)
The result I get is
1 2 0
-2 -1 -3
0 1 -1
1 2 0
-1 0 -2
Ideally I would like to get this result instead
1 2 0
1 2 0
1 2 0
1 2 0
1 2 0
The while will not start if any value is above zero to begin with and stops as soon as one value goes above zero.
I hope this makes sense and I have supplied enough information
Thank you for your time and help.
Your current problem is that you stop iterating the very moment any of the values in the third row break the condition. Correct me if I'm wrong, but what I think you want is to continue doing iterations on the remaining rows, until the conditions are broken by all third columns.
You could do that like this:
inds = true(size(Vf,1),1);
while any(inds)
Vf(inds,:) = Vf(inds,:)+1;
inds = Vf(:,3) < 0;
end
Of course, for the simple addition you provide, there is a better and faster way:
inds = Vf(:,3)<0;
Vf(inds,:) = bsxfun(#minus, Vf(inds,:), Vf(inds,3));
But for general functions, the while above will do the trick.

Resources