Read values and list of lists in Haskell - file

Before to mark this question as duplicated, I already read this topic: Haskell read Integer and list of lists from file and the solution doesn't solve my problem.
I'm trying to read the content in a File that contains this structure:
String, String, [(Int, Int, Int)]
The file looks something like this:
Name1 22/05/2018 [(1, 5, 10), (2, 5, 5), (3, 10, 40)]
Name2 23/05/2018 [(1, 10, 10), (2, 15, 5), (3, 50, 40),(4,20,5)]
Name3 22/05/2018 [(4, 2, 1), (5, 2, 2), (6, 50, 3), (1,2,3)]
Name4 23/05/2018 [(1, 3, 10), (2, 1, 5), (3, 2, 40), (6,20,20)]
In Haskell, I created this function to read the contents of the file and "convert" this content to my custom type.
rlist :: String -> [(Int, Int, Int)]
rlist = read
loadPurchases :: IO [(String, String, [(Int, Int, Int)])]
loadPurchases = do s <- readFile "tes.txt"
return (glpurch (map words (lines s)))
glpurch :: [[String]] -> [(String, String, [(Int, Int, Int)])]
glpurch [] = []
gplpurch ([name, dt, c]:r) = (name, dt, (rlist c)) : gplpurch r
But when I try to execute the "loadPurchases" function, I get this error:
Non-exhaustive patterns in function glpurch.
Using :set -Wall, I received this help message:
<interactive>:6:1: warning: [-Wincomplete-patterns]
Pattern match(es) are non-exhaustive
In an equation for `glpurch':
Patterns not matched:
([]:_:_)
([_]:_)
([_, _]:_)
((_:_:_:_:_):_)
My problem is how to create all these conditions.
I will be very grateful if anyone can help me create those conditions that are likely to determine the "stopping condition"

You are only matching lists of length 3 when in fact there are many more words on each line. Just try it in GHCi:
> words "Name1 22/05/2018 [(1, 5, 10), (2, 5, 5), (3, 10, 40)]"
["Name1","22/05/2018","[(1,","5,","10),","(2,","5,","5),","(3,","10,","40)]"]
You probably want to recombine all words past the first two:
glpurch ((name : dt : rest) :r) = (name, dt, (rlist $ unwords rest)) : gplpurch r

To solve my problem, I did what #Welperooni and #Thomas M. DuBuisson suggested.
I added this code to my function:
glpurch ((name: dt: c: _): r) = (name, dt, (read c :: [(Cod, Quant, Price)
And I removed the blanks that were in the list in my file, these spaces made the division of the text not done correctly.

Related

Find total number of ways possible to create an array of size M

Suppose I have M = 2 and N = 5 and K = 2 where
M = size of array
N = Maximum number that can be present as an array element
K = Minimum number that can be present as an array element.
So how do I find the number of possible ways to create an array using the above conditions. Also the current number should be not be greater than the previous element.
The arrays created using the above conditions are
[5,5],[5,4],[5,3],[5,2],[4,4],[4,3],[4,2],[3,3],[3,2],[2,2]
i.e 10 array can be created from the above conditions.
I tried doing it by using combinations and factorials, but not getting the desired output. Any help would be appreciated.
Assuming you are just interested in the number of combinations the formula is -
(N-K+M)!/(M!(N-K+1)!)
See more here
This is known as a combinations_with_replacement: combination because the order doesn't matter (or it would be a permutation), and with replacement because elements can be repeated, like [5, 5].
list(itertools.combinations_with_replacement(range(2, 6), 2))
# [(2, 2), (2, 3), (2, 4), (2, 5), (3, 3), (3, 4), (3, 5), (4, 4), (4, 5), (5, 5)]
If you want the exact ones you listed, you will have to reverse each element, and the list itself.
list(reversed([tuple(reversed(element)) for element in itertools.combinations_with_replacement(range(2,6), 2)]))

How to average similar tuples in an array of tuples in Swift

I really need your help. I have an array of tuples that looks like this:
[("07-21-2016", 5), ("07-21-2016", 1), ("07-21-2016", 2), ("07-21-2016", 3), ("07-21-2016", 4), ("07-21-2016", 5), ("07-20-2016", 6), ("07-20-2016", 5), ("07-19-2016", 5)]
I need to take all the tuples with the same date and average them out. So at the end it would look like:
[("07-21-2016", 33.3), ("07-20-2016", 5.5), ("07-19-2016", 5)]
Does anyone know how to do this?
let array = [("07-21-2016", 5), ("07-21-2016", 1), ("07-21-2016", 2), ("07-21-2016", 3), ("07-21-2016", 4), ("07-21-2016", 5), ("07-20-2016", 6), ("07-20-2016", 5), ("07-19-2016", 5)]
// Create dictionary to hold mapping of date to array of values
var dict = [String: [Double]]()
// use forEach to add each value to the array for each key
array.forEach {(date, num) in dict[date] = (dict[date] ?? []) + [Double(num)]}
// use map with reduce to find the average of each value and return a tuple
// containing the date and the average value
let result = dict.map {(date, nums) in (date, nums.reduce(0, combine: +) / Double(nums.count))}
print(result)
Output:
[("07-20-2016", 5.5), ("07-19-2016", 5.0), ("07-21-2016", 3.3333333333333335)]
Explanation:
array.forEach {(date, num) in dict[date] = (dict[date] ?? []) + [Double(num)]}
forEach takes each tuple of the array, looks up the array of values corresponding to dict[date] and appends the new num to that array. If dict[date] returns nil, then this is the first time we've seen this key, so use the nil coalescing operator ?? to return an empty array [] and append the new value to that.
At the end of this, the contents of dict is:
["07-20-2016": [6.0, 5.0], "07-19-2016": [5.0], "07-21-2016": [5.0, 1.0, 2.0, 3.0, 4.0, 5.0]]
let result = dict.map {(date, nums) in (date, nums.reduce(0, combine: +) / Double(nums.count))}
When map is applied to a dictionary, it takes each (key, value) pair and creates a new value based upon that. The end result of map is a new array of the values it returns. In this case, the value returned for each iteration of map is a tuple containing the date and the average of the numbers associated with that date.
nums.reduce(0, combine: +)
This sums the values in the nums array. reduce takes an initial value (0 in this case) and a closure that will be evaluated for each value in the nums array. Each iteration of reduce takes the current running total and the next value in nums and sums them. This sum is then divided by Double(nums.count) to produce the average. Finally, map returns (date, avg) which produces the final result.
Here's one way (using a dictionary to collate the numbers):
let dateValues = [("07-21-2016", 5), ("07-21-2016", 1), ("07-21-2016", 2), ("07-21-2016", 3), ("07-21-2016", 4), ("07-21-2016", 5), ("07-20-2016", 6), ("07-20-2016", 5), ("07-19-2016", 5)]
var averages:[String:(Int,Int)] = [:]
for (date,value) in dateValues
{
averages[date] = averages[date] ?? (0,0)
averages[date] = (averages[date]!.0 + value, averages[date]!.1 + 1)
}
let averagePerDate = averages.map{($0,Float($1.0)/Float($1.1))}.sort{$0.0>$1.0}
print(averagePerDate)
// [("07-21-2016", 3.33333325), ("07-20-2016", 5.5), ("07-19-2016", 5.0)]
And a more concise one using sets:
let dateList = dateValues.reduce( Set<String>(), combine: { $0.union(Set([$1.0])) })
let dateData = dateList.map{ date in return (date, dateValues.filter({$0.0==date}).map{$0.1}) }
let dateCounts = dateData.map{ ($0, $1.reduce(0,combine:+), Float($1.count) ) }
let dateAverages = dateCounts.map{ ($0, Float($1/$2) ) }.sort{$0.0>$1.0}
print(dateAverages)

Methods of creating a structured array

I have the following information and I can produce a numpy array of the desired structure. Note that the values x and y have to be determined separately since their ranges may differ so I cannot use:
xy = np.random.random_integers(0,10,size=(N,2))
The extra list[... conversion is necessary for the conversion in order for it to work in Python 3.4, it is not necessary, but not harmful when using Python 2.7.
The following works:
>>> # attempts to formulate [id,(x,y)] with specified dtype
>>> N = 10
>>> x = np.random.random_integers(0,10,size=N)
>>> y = np.random.random_integers(0,10,size=N)
>>> id = np.arange(N)
>>> dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
>>> arr = np.array(list(zip(id,np.hstack((x,y)))),dt)
>>> arr
array([(0, [7.0, 7.0]), (1, [7.0, 7.0]), (2, [5.0, 5.0]), (3, [0.0, 0.0]),
(4, [6.0, 6.0]), (5, [6.0, 6.0]), (6, [7.0, 7.0]),
(7, [10.0, 10.0]), (8, [3.0, 3.0]), (9, [7.0, 7.0])],
dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])
I cleverly thought I could circumvent the above nasty bits by simply creating the array in the desired vertical structure and applying my dtype to it, hoping that it would work. The stacked array is correct in the vertical form
>>> a = np.vstack((id,x,y)).T
>>> a
array([[ 0, 7, 6],
[ 1, 7, 7],
[ 2, 5, 9],
[ 3, 0, 1],
[ 4, 6, 1],
[ 5, 6, 6],
[ 6, 7, 6],
[ 7, 10, 9],
[ 8, 3, 2],
[ 9, 7, 8]])
I tried several ways of trying to reformulate the above array so that my dtype would work and I just can't figure it out (this included vstacking a vstack etc). So my question is...how can I use the vstack version and get it into a format that meets my dtype requirements without having to go through the procedure that I did. I am hoping it is obvious, but I am sliced, stacked and ellipsed myself into an endless loop.
SUMMARY
Many thanks to hpaulj. I have included two incarnations based upon his suggestions for others to consider. The pure numpy solution is substantially faster and a lot cleaner.
"""
Script: pnts_StackExch
Author: Dan.Patterson#carleton.ca
Modified: 2015-08-24
Purpose:
To provide some timing options on point creation in preparation for
point-to-point distance calculations using einsum.
Reference:
http://stackoverflow.com/questions/32224220/
methods-of-creating-a-structured-array
Functions:
decorators: profile_func, timing, arg_deco
main: make_pnts, einsum_0
"""
import numpy as np
import random
import time
from functools import wraps
np.set_printoptions(edgeitems=5,linewidth=75,precision=2,suppress=True,threshold=5)
# .... wrapper funcs .............
def delta_time(func):
"""timing decorator function"""
import time
#wraps(func)
def wrapper(*args, **kwargs):
print("\nTiming function for... {}".format(func.__name__))
t0 = time.time() # start time
result = func(*args, **kwargs) # ... run the function ...
t1 = time.time() # end time
print("Results for... {}".format(func.__name__))
print(" time taken ...{:12.9f} sec.".format(t1-t0))
#print("\n print results inside wrapper or use <return> ... ")
return result # return the result of the function
return wrapper
def arg_deco(func):
"""This wrapper just prints some basic function information."""
#wraps(func)
def wrapper(*args,**kwargs):
print("Function... {}".format(func.__name__))
#print("File....... {}".format(func.__code__.co_filename))
print(" args.... {}\n kwargs. {}".format(args,kwargs))
#print(" docs.... {}\n".format(func.__doc__))
return func(*args, **kwargs)
return wrapper
# .... main funcs ................
#delta_time
#arg_deco
def pnts_IdShape(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
"""Make N points based upon a random normal distribution,
with optional min/max values for Xs and Ys
"""
dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
IDs = np.arange(0,N)
Xs = np.random.random_integers(x_min,x_max,size=N) # note below
Ys = np.random.random_integers(y_min,y_max,size=N)
a = np.array([(i,j) for i,j in zip(IDs,np.column_stack((Xs,Ys)))],dt)
return IDs,Xs,Ys,a
#delta_time
#arg_deco
def alternate(N=1000000,x_min=0,x_max=10,y_min=0,y_max=10):
""" after hpaulj and his mods to the above and this. See docs
"""
dt = np.dtype([('ID','<i4'),('Shape',('<f8',(2,)))])
IDs = np.arange(0,N)
Xs = np.random.random_integers(0,10,size=N)
Ys = np.random.random_integers(0,10,size=N)
c_stack = np.column_stack((IDs,Xs,Ys))
a = np.ones(N, dtype=dt)
a['ID'] = c_stack[:,0]
a['Shape'] = c_stack[:,1:]
return IDs,Xs,Ys,a
if __name__=="__main__":
"""time testing for various methods
"""
id_1,xs_1,ys_1,a_1 = pnts_IdShape(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10)
id_2,xs_2,ys_2,a_2 = alternate(N=1000000,x_min=0, x_max=10, y_min=0, y_max=10)
Timing results for 1,000,000 points are as follows
Timing function for... pnts_IdShape
Function... **pnts_IdShape**
args.... ()
kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... pnts_IdShape
time taken ... **0.680652857 sec**.
Timing function for... **alternate**
Function... alternate
args.... ()
kwargs. {'N': 1000000, 'y_max': 10, 'x_min': 0, 'x_max': 10, 'y_min': 0}
Results for... alternate
time taken ... **0.060056925 sec**.
There are 2 ways of filling a structured array (http://docs.scipy.org/doc/numpy/user/basics.rec.html#filling-structured-arrays) - by row (or rows with list of tuples), and by field.
To do this by field, create the empty structured array, and assign values by field name
In [19]: a=np.column_stack((id,x,y))
# same as your vstack().T
In [20]: Y=np.zeros(a.shape[0], dtype=dt)
# empty, ones, etc
In [21]: Y['ID'] = a[:,0]
In [22]: Y['Shape'] = a[:,1:]
# (2,) field takes a 2 column array
In [23]: Y
Out[23]:
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
(4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
(8, [6.0, 1.0]), (9, [6.0, 6.0])],
dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])
On the surface
arr = np.array(list(zip(id,np.hstack((x,y)))),dt)
looks like an ok way of constructing the list of tuples need to fill the array. But result duplicates the values of x instead of using y. I'll have to look at what is wrong.
You can take a view of an array like a if the dtype is compatible - the data buffer for 3 int columns is layed out the same way as one with 3 int fields.
a.view('i4,i4,i4')
But your dtype wants 'i4,f8,f8', a mix of 4 and 8 byte fields, and a mix of int and float. The a buffer will have to be transformed to achieve that. view can't do it. (don't even ask about .astype.)
corrected list of tuples method:
In [35]: np.array([(i,j) for i,j in zip(id,np.column_stack((x,y)))],dt)
Out[35]:
array([(0, [8.0, 8.0]), (1, [8.0, 0.0]), (2, [6.0, 2.0]), (3, [8.0, 8.0]),
(4, [3.0, 2.0]), (5, [6.0, 1.0]), (6, [5.0, 6.0]), (7, [7.0, 7.0]),
(8, [6.0, 1.0]), (9, [6.0, 6.0])],
dtype=[('ID', '<i4'), ('Shape', '<f8', (2,))])
The list comprehension produces a list like:
[(0, array([8, 8])),
(1, array([8, 0])),
(2, array([6, 2])),
....]
For each tuple in the list, the [0] goes in the first field of the dtype, and [1] (a small array), goes in the 2nd.
The tuples could also be constructed with
[(i,[j,k]) for i,j,k in zip(id,x,y)]
dt1 = np.dtype([('ID','<i4'),('Shape',('<i4',(2,)))])
is a view compatible dtype (still 3 integers)
In [42]: a.view(dtype=dt1)
Out[42]:
array([[(0, [8, 8])],
[(1, [8, 0])],
[(2, [6, 2])],
[(3, [8, 8])],
[(4, [3, 2])],
[(5, [6, 1])],
[(6, [5, 6])],
[(7, [7, 7])],
[(8, [6, 1])],
[(9, [6, 6])]],
dtype=[('ID', '<i4'), ('Shape', '<i4', (2,))])

last element of array matching scala

Good afternoon! I'm using Scala and I want to match first three element of a list and the last one, no matter how much of them are in the list.
val myList:List[List[Int]] = List(List(3,1,2,3,4),List(23,45,6,7,2),List(3,3,2,1,5,34,43,2),List(8,5,3,34,4,5,3,2),List(3,2,45,56))
def parse(lists: List[Int]): List[Int] = lists.toArray match{
case Array(item, site, buyer, _*, date) => List(item, site, buyer, date)}
myList.map(parse _)
But I get : error: bad use of _* (a sequence pattern must be the last pattern)
I understand why I get it, but how can I avoid?
My use case is that I'm reading from hdfs, and every file has exact N (N is constant and equal for all files) columns, so I want to match only some of them, without writing something like case Array(item1, item2 , ..., itemN) => List(item1, item2, itemK, itemN)
Thank you!
You do not need to convert lists to Arrays, because lists are designed for pattern matching.
scala> myList match {
case item :: site :: buyer :: tail if tail.nonEmpty =>
item :: site :: buyer :: List(tail.last)
}
res3: List[List[Int]] = List(List(3, 1, 2, 3, 4), List(23, 45, 6, 7, 2),
List(3, 3, 2, 1, 5, 34, 43, 2), List(3, 2, 45, 56))
Or even more concise solution suggested by Kolmar
scala> myList match {
case item :: site :: buyer :: (_ :+ date) => List(item, site, buyer, date)
}

Type-safe rectangular multidimensional array type

How do you represent a rectangular 2-dimensional (or multidimensional) array data structure in Scala?
That is, each row has the same length, verified at compile time, but the dimensions are determined at runtime?
Seq[Seq[A]] has the desired interface, but it permits the user to provide a "ragged" array, which can result in a run-time failure.
Seq[(A, A, A, A, A, A)] (and similar) does verify that the lengths are the same, but it also forces this length to be specified at compile time.
Example interface
Here's an example interface of what I mean (of course, the inner dimension doesn't have to be tuples; it could be specified as lists or some other type):
// Function that takes a rectangular array
def processArray(arr : RectArray2D[Int]) = {
// do something that assumes all rows of RectArray are the same length
}
// Calling the function (OK)
println(processArray(RectArray2D(
( 0, 1, 2, 3),
(10, 11, 12, 13),
(20, 21, 22, 23)
)))
// Compile-time error
println(processArray(RectArray2D(
( 0, 1, 2, 3),
(10, 11, 12),
(20, 21, 22, 23, 24)
)))
This is possible using the Shapeless library's sized types:
import shapeless._
def foo[A, N <: Nat](rect: Seq[Sized[Seq[A], N]]) = rect
val a = Seq(Sized(1, 2, 3), Sized(4, 5, 6))
val b = Seq(Sized(1, 2, 3), Sized(4, 5))
Now foo(a) compiles, but foo(b) doesn't.
This allows us to write something very close to your desired interface:
case class RectArray2D[A, N <: Nat](rows: Sized[Seq[A], N]*)
def processArray(arr: RectArray2D[Int, _]) = {
// Run-time confirmation of what we've verified at compile-time.
require(arr.rows.map(_.size).distinct.size == 1)
// Do something.
}
// Compiles and runs.
processArray(RectArray2D(
Sized( 0, 1, 2, 3),
Sized(10, 11, 12, 13),
Sized(20, 21, 22, 23)
))
// Doesn't compile.
processArray(RectArray2D(
Sized( 0, 1, 2, 3),
Sized(10, 11, 12),
Sized(20, 21, 22, 23)
))
Using encapsulation to ensure proper size.
final class Matrix[T]( cols: Int, rows: Int ) {
private val container: Array[Array[T]] = Array.ofDim[T]( cols, rows )
def get( col: Int, row: Int ) = container(col)(row)
def set( col: Int, row: Int )( value: T ) { container(col)(row) = value }
}
Note: I misread the question, mistaking a rectangle for a square. Oh, well, if you're looking for squares, this would fit. Otherwise, you should go with #Travis Brown's answer.
This solution may not be the most generic one, but it coincides with the way Tuple classes are defined in Scala.
class Rect[T] private (val data: Seq[T])
object Rect {
def apply[T](a1: (T, T), a2: (T, T)) = new Rect(Seq(a1, a2))
def apply[T](a1: (T, T, T), a2: (T, T, T), a3: (T, T, T)) = new Rect(Seq(a1, a2, a3))
// Continued...
}
Rect(
(1, 2, 3),
(3, 4, 5),
(5, 6, 7))
This is the interface you were looking for and the compiler will stop you if you have invalid-sized rows, columns or type of element.

Resources