Haskell iteration (not literally) over a list - loops

I know I should be forgetting about iterating in functional languages, but I dont know how else to put forth my question.
If I have a list of integers arranged in ascending or descending order, and I have an arbitrary number that may or may not be present in the list, how can I loop over the list to find a number that is small than the given number and return that integer.
I just need to know how to go about it.

You could use find to find the first element matching a predicate you specify. Example:
find even [3,5,7,6,2,3,4]
Or, you could drop all the unwanted elements from the left:
dropWhile (not . even) [3,5,7,6,2,3,4]
(and possibly take the first element remaining, which has to be even).
Or, you could filter out unwanted elements
filter even [3,5,7,6,2,3,4]
Or, you could use recursion and code everything yourself.

You can recursively deconstruct the list with pattern matching:
searchList :: Int -> [Int] -> ???
searchList n [] = ???
searchList n (x:xs) = ???
You check whether x is the number you want, and if not you can call searchList n xs to search the rest of the list.
The normal way to do something like that would be with the library function foldr, but it would be better to understand how to do this with recursion first.

You can have "state" in a list iteration by using a fold - the state is passed from one iteration to the next in a function argument.
An example:
sup :: [Int] -> Int -> Int
sup xs y = go (head xs) xs
where
go s [] = s
go s (x:xs) = if x >= y then s else go x xs
Here s is the "state" -- it is the latest value in the list that is less than y.
Because you said the input list would never be empty, head xs is okay here.
This is almost what you want. Perhaps you can modify it to handle all of your conditions.

Related

Haskell - Sum of the differences between each element in each matrix

I am very new to Haskell (and functional programming in general) and I am trying to write a function called
"profileDistance m1 m2" that takes two matrices as parameters and needs to calculate the sum of the differences between each element in each matrix... I might have not explained that very well. Let me show it instead.
The matrices are on the form of: [[(Char,Int)]]
where each matrix might look something like this:
m1 = [[('A',1),('A',2)],
[('B',3),('B',4)],
[('C',5),('C',6)]]
m2 = [[('A',7),('A',8)],
[('B',9),('B',10)],
[('C',11),('C',12)]]
(Note: I wrote the numbers in order in this example but they can be ANY numbers in any order. The chars in each row in each matrix will however match like shown in the example.)
The result (in the case above) would look something like (psuedo code):
result = ((snd m1['A'][0])-(snd m2['A'][0]))+((snd m1['A'][1])-(snd m2['A'][1]))+((snd m1['B'][0])-(snd m2['B'][0]))+((snd m1['B'][1])-(snd m2['B'][1]))+((snd m1['C'][0])-(snd m2['C'][0]))+((snd m1['C'][1])-(snd m2['C'][1]))
This would be easy to do in any language that has for-loops and is non-functional but I have no idea how to do this in Haskell. I have a feeling that functions like map, fold or sum would help me here (admittedly I am not a 100% sure on how fold works). I hope there is an easy way to do this... please help.
Here a proposal:
solution m1 m2 = sum $ zipWith diffSnd flatM1 flatM2
where
diffSnd t1 t2 = snd t1 - snd t2
flatM1 = concat m1
flatM2 = concat m2
I wrote it so that it's easier to understand the building blocks.
The basic idea is to iterate simultaneously on our two lists of pairs using zipWith. Here its type:
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
It means it takes a function with type a -> b -> c, a list of a's and a list of b's, and it returns a list of c's. In other words, zipWith takes case of the iteration, you just have to specify what you want to do with every item the iteration yields, that in your case will be a pair of pairs (one from the first matrix, another one from the second).
The function passed to zipWith takes the snd element from each pair, and computes the difference. Looking back at zipWith signature you can deduce it will return a list of numbers. So the last thing we need to do is summing them, using the function sum.
There's one last problem. We actually do not have two lists of pairs to be passed to zipWith!, but two matrices. We need to "flatten" them in a list, preserving the order of the elements. That's exactly what concat does, hence the calls to that function in the definitions of flatM1 and flatM2.
I suggest you look into the implementation of every function I mentioned to have a better grasp of how iteration is expressed by mean of recursion. HTH

Array vs List in Elm

I was suprised to learn that Array and List were two different types in Elm:
Array
List
In my case, I have a List Int of length 2,000,000 and I need about 10,000 of them but I don't know in advance which ten thousand. That will be provided by another list. In pseudo-code:
x = [ 1,1,0,30,...,255,0,1 ]
y = [ 1,4,7,18,36,..., 1334823 , ... 1899876 ]
z = [ y[x[0]], y[x[1]], ... ]
I am using pseudocode because clearly this isn't Elm syntax (it might be legal JavaScript).
Can these array selections be done in List or Array or both?
List is a linked list which provides O(n) lookup time based on index. Getting an element by index requires traversing the list over n nodes. An index lookup function for List isn't available in the core library but you can use the elm-community/list-extra package which provides two functions for lookup (varying by parameter order): !! and getAt.
Array allows for O(log n) index lookup. Index lookups on Array can be done using Array.get. Arrays are represented as Relaxed Radix Balanced Trees.
Both are immutable (all values in Elm are immutable), so you have trade-offs depending on your situation. List is great when you make a lot of changes because you are merely updating linked list pointers, whereas Array is great for speedy lookup but has somewhat poorer performance for modifications, which you'll want to consider if you're making a lot of changes.
Something like this should work:
import Array
import Debug
fromJust : Maybe a -> a
fromJust x = case x of
Just y -> y
Nothing -> Debug.crash "error: fromJust Nothing"
selectFromList : List a -> List Int -> List a
selectFromList els idxs =
let arr = Array.fromList els
in List.map (\i -> fromJust (Array.get i arr)) idxs
It converts the input list to an array for fast indexing, then maps the list of indices to their corresponding values in the array. I took the fromJust function from this StackOverflow question.
Only use Array if you need to use Array.get.
In most cases you should use List because usually you can do everything you need with foldl, map, etc. without having to get items from an index, and List has better performance with these functions.

haskell repeat all chars in a string

i just started with Haskell and wanted to do a little function that takes an integer and a String to repeat each char in the String as often as the integer implies.
e.g.: multiply 3 "hello" would output "hhheeelllooo"
My problem now is that i am not sure how to iterate over all the chars.
multiply::Int->String->String
multiply 1 s = s
multiply i s = multiply (i-1) (take 1 s ++ s)
so what i would get is "hhhello". so basically i need to do something like:
mult::Int->String->String
mult 0 s = []
mult 1 s = s
mult i s = "iterate over s, take each char and call a modified version of the multiply method that only takes chars above"
Thank you for helping me out
This gets easier when you use the standard library. First off, repeating an item is done with replicate:
Prelude> replicate 3 'h'
"hhh"
You can then partially apply this function and map it over the string:
Prelude> map (replicate 3) "hello"
["hhh", "eee", "lll", "lll", "ooo"]
And finally concat that list of strings into one string:
Prelude> concat (map (replicate 3) "hello")
"hhheeellllllooo"
The composition of concat and map can be abbreviated as concatMap (this is a library function, not a language feature).
Prelude> concatMap (replicate 3) "hello"
"hhheeellllllooo"
So your function becomes
mult n s = concatMap (replicate n) s
For extra brevity, write this in point-free style as
mult = concatMap . replicate
There are many ways to achieve the same effect as you would with a loop in other languages, and larsmans has shown you one way, using map. Another common way is with recursion. You already know what to do with the first character, so you can recurse through the list like so:
multiply n [] = []
multiply n (x:xs) = replicate n x ++ multiply n xs
larsmans has explained how replicate works. For your homework, maybe you're not supposed to use library functions like replicate, so you can replace the call to replicate with your own version.
Another way based on monadic's nature of list.
You'd like to apply a function to each element of a list.
To do this just bind the list to the function, like this
# "hello" >>= replicate 3
Or,
# let f = flip (>>=) . replicate
To remove flip,
# let g = (=<<) . replicate
You can use applicative functors for this:
import Control.Applicative
multiply n = (<* [1..n])
--- multiply 3 "hello" --> "hhheeellllllooo"

Growing arrays in Haskell

I have the following (imperative) algorithm that I want to implement in Haskell:
Given a sequence of pairs [(e0,s0), (e1,s1), (e2,s2),...,(en,sn)], where both "e" and "s" parts are natural numbers not necessarily different, at each time step one element of this sequence is randomly selected, let's say (ei,si), and based in the values of (ei,si), a new element is built and added to the sequence.
How can I implement this efficiently in Haskell? The need for random access would make it bad for lists, while the need for appending one element at a time would make it bad for arrays, as far as I know.
Thanks in advance.
I suggest using either Data.Set or Data.Sequence, depending on what you're needing it for. The latter in particular provides you with logarithmic index lookup (as opposed to linear for lists) and O(1) appending on either end.
"while the need for appending one element at a time would make it bad for arrays" Algorithmically, it seems like you want a dynamic array (aka vector, array list, etc.), which has amortized O(1) time to append an element. I don't know of a Haskell implementation of it off-hand, and it is not a very "functional" data structure, but it is definitely possible to implement it in Haskell in some kind of state monad.
If you know approx how much total elements you will need then you can create an array of such size which is "sparse" at first and then as need you can put elements in it.
Something like below can be used to represent this new array:
data MyArray = MyArray (Array Int Int) Int
(where the last Int represent how many elements are used in the array)
If you really need stop-and-start resizing, you could think about using the simple-rope package along with a StringLike instance for something like Vector. In particular, this might accommodate scenarios where you start out with a large array and are interested in relatively small additions.
That said, adding individual elements into the chunks of the rope may still induce a lot of copying. You will need to try out your specific case, but you should be prepared to use a mutable vector as you may not need pure intermediate results.
If you can build your array in one shot and just need the indexing behavior you describe, something like the following may suffice,
import Data.Array.IArray
test :: Array Int (Int,Int)
test = accumArray (flip const) (0,0) (0,20) [(i, f i) | i <- [0..19]]
where f 0 = (1,0)
f i = let (e,s) = test ! (i `div` 2) in (e*2,s+1)
Taking a note from ivanm, I think Sets are the way to go for this.
import Data.Set as Set
import System.Random (RandomGen, getStdGen)
startSet :: Set (Int, Int)
startSet = Set.fromList [(1,2), (3,4)] -- etc. Whatever the initial set is
-- grow the set by randomly producing "n" elements.
growSet :: (RandomGen g) => g -> Set (Int, Int) -> Int -> (Set (Int, Int), g)
growSet g s n | n <= 0 = (s, g)
| otherwise = growSet g'' s' (n-1)
where s' = Set.insert (x,y) s
((x,_), g') = randElem s g
((_,y), g'') = randElem s g'
randElem :: (RandomGen g) => Set a -> g -> (a, g)
randElem = undefined
main = do
g <- getStdGen
let (grownSet,_) = growSet g startSet 2
print $ grownSet -- or whatever you want to do with it
This assumes that randElem is an efficient, definable method for selecting a random element from a Set. (I asked this SO question regarding efficient implementations of such a method). One thing I realized upon writing up this implementation is that it may not suit your needs, since Sets cannot contain duplicate elements, and my algorithm has no way to give extra weight to pairings that appear multiple times in the list.

How to make my Haskell code use Laziness and Garbage collector

I wrote a Haskell code which has to solve the following problem : we have n files : f1, f2, f3 .... fn and I cut those files such a way that each slice has 100 lines
f1_1, f1_2, f1_3 .... f1_m
f2_1, f2_2, .... f2_n
...
fn_1, fn_2, .... fn_k
finally I construct a special data type (Dags) using slices in the following way
f1_1, f2_1, f3_1, .... fn_1 => Dag1
f1_2, f2_2, f3_2, ..... fn_2 => Dag2
....
f1_k, f2_k, f3_k, ..... fn_k => Dagk
the code that I wrote start by cutting all the files, then it couple the i-th elements of the results list and construct Dag using the final result list
it looks like this
-- # take a filename and cut the file in slices of 100 lines
sliceFile :: FilePath -> [[String]]
-- # take a list of lists and group the i-th elements into list
coupleIthElement :: [[String]] -> [[String]]
-- # take a list of lines and create a DAG
makeDags :: [String] -> Dag
-- # final code look like this
makeDag_ :: [FilePath] -> [Dag]
makeDags files = map makeDags $ coupleIthElement (concat (map sliceFile files))
The problem is that this code is non-efficient because :
it needs storing all the files in memory in list form
the garbage collector is not working efficiently since all fonctions need the results list of the previous fonction
How could I re-write my program to take advantage of garbage collector work and Laziness of Haskell ?
if not possible or easier, what can i do to be more efficient even a bit ?
thanks for reply
edit
coupleIthElement ["abc", "123", "xyz"] must return ["a1x","b2y","c3z"]
of cause the 100 lines are arbitrary selected using a particular criteria upon some element of the lines but i discard this aspect to make the problem more easier to understand,
another edition
data Dag = Dag ([(Int, String)], [((Int, Int), Int)]) deriving Show
test_dag = Dag ([(1, "a"),(2, "b"),(3, "c")],[((1,2),1),((1,3),1)])
test_dag2 = Dag ([],[])
the first list is each vertice define by the number and the label, the second list is the edges ((1,2),3) means edge between vertice 1 and 2 with the cost 3
A few points:
1) Have you considered using fgl? It's probably more efficient than your own Dag implementation. If you really need to use Dag, you could construct your graphs with fgl then convert them to Dag when they're complete.
2) It seems like you don't actually use the slices when constructing your graphs, rather they control how many graphs you have. If so, how about something like this:
dagFromHandles :: [Handle] -> IO Dag
dagFromHandles = fmap makeDags . mapM hGetLine
allDags :: [FilePath] -> IO [Dag]
allDags listOfFiles = do
handles <- mapM (flip openFile ReadMode) listOfFiles
replicateM 100 (dagFromHandles handles)
This assumes that each file has at least 100 lines, and any extra lines will be ignored. Even better would be if you had a function that would consume a Dag, then you could do
useDag :: Dag -> IO ()
runDags :: [FilePath] -> IO ()
runDags listOfFiles = do
handles <- mapM (flip openFile ReadMode) listOfFiles
replicateM_ 100 (dagFromHandles handles >>= useDag)
This should make more efficient use of garbage collection.
Of course this assumes that I understand the problem properly, and I'm not certain that I do. Note that concat (map sliceFile) should be a no-op (sliceFile would need to be in IO as you've defined the type, but ignoring that for now), so I don't see why you're bothering with it at all.
If it's not needed to process your file in slices, avoid this. Haskell does this automatically! In Haskell, you think of IO as a stream. Data is read from input, as soon as it's needed and discarded, as soon as it's unused. So for instance, this is an easy file-copying programm:
main = interact id
interact has the signature interact :: (String -> String) -> IO (), and feeds the input into a function which handles it and produces some output, which is written to stdout. This program is more efficient then most C-implementations, as the runtime automatically buffers the input and output.
If you want to understand laziness, you have to forget all the wisdom you learned as a imperative programmer and have to think about a program as a description to modify data, not as a set of instructions - data is only processed when needed!
The key point, why your data may be handled the wrong way is the multiple traversion of the list. Your function makeDags traverses the transposed the slices list one by one, so the elements of the original list may not be discarded. What you should try, is to write your function in a way like this:
sliceFile :: FilePath -> [[String]]
sliceFile fp = do
f <- readFile fp
let l = lines fp
slice [] = []
slice x = ll : slice ls where (ll,ls) = splitAt 100 x
return slice l
sliceFirstRow :: [[String]] -> ([String],[[String]])
sliceFirstRow list = unzip $ map (\(x:xs) -> (x,xs)) list
makeDags :: [[String]] -> [Dag]
makeDags [[]] = []
makeDags list = makeDag firstRow : makeDags restOfList where
(firstRow,restOfList) = sliceFirstRow list
This function may be a solution, since the first row is no longer referenced, when it's done. But in the most places, this is a result of laziness, so you could probably try to use seq to force building the Dags and allowing the IO data to be garbage-collected. (If you don't force building the dags, the data can't be garbage collected).
But anyway, I could provide a more helpfull answer, if you give some informations about what these dags are.

Resources