OCaml How to write into file? - file

I'm looking for a way to write two ints to a file. There will be many pairs of two ints. Between the two numbers there should be a space (I mean ''). For example, something like this:
1 2
6 896
243 865
....

You can use something like this:
let rec print_numbers oc = function
| [] -> ()
| e::tl -> Printf.fprintf oc "%d %d\n" (fst e) (snd e); print_numbers oc tl
let () =
let nums = [(1, 2); (6, 896); (243, 865)] in
let oc = open_out "filename.txt" in
print_numbers oc nums;
close_out oc;
This assumes your data is a list of pairs.

If you use Core, you can do this:
open Core.Std
let () = Out_channel.write_all "your_file.txt" ~data:"Your text"

Related

Combine arrays with conditions in Ruby

I have a class People with three properties
class People
attr_accessor :first_name, :last_name, :age
end
And I have two arrays:
a = [p1, p2]
b = [p3, p4]
Is there any easy way to combine these two arrays in a new array and remove the item with a condition like:
p1.first_name + p1.last_name == p3.first_name + p3.last_name
And after that all the item should be belong to array a
For example
p1.first_name = "Ada"
p1.last_name = "Wang"
p1.age = 28
p2.first_name = "Leon"
p2.last_name = "S"
p2.age = 28
p3.first_name = "Ada"
p3.last_name = "Wang"
p3.age = 18
p4.first_name = "Mario"
p4.last_name = "M"
p4.age = 80
the result should be [p1] the 28 years old Ada.Wang
I'm not sure I get your point, but maybe this is a possible option.
c = a + b
c.uniq! { |e| e.first_name && e.last_name }
Call Array#uniq! with a block on c which is the concatenation of a and b.
If arrays a and b themselves do not contain people with matching first and last names then this would work:
b.each_with_index do |p, i|
if !(b[i].first_name == a[i].first_name and b[i].last_name == a[i].last_name)
a.push(p) # as people p does not contain the same first/last names as a it can now be added to a
end
end
To check for other fields simply replace first_name / last_name with other variables.

Creating a Random Feature Array in Spark DataFrames

When creating an ALS model, we can extract a userFactors DataFrame and an itemFactors DataFrame. These DataFrames contain a column with an Array.
I would like to generate some random data and union it to the userFactors DataFrame.
Here is my code:
val df1: DataFrame = Seq((123, 456, 4.0), (123, 789, 5.0), (234, 456, 4.5), (234, 789, 1.0)).toDF("user", "item", "rating")
val model1 = (new ALS()
.setImplicitPrefs(true)
.fit(df1))
val iF = model1.itemFactors
val uF = model1.userFactors
I then create a random DataFrame using a VectorAssembler with this function:
def makeNew(df: DataFrame, rank: Int): DataFrame = {
var df_dummy = df
var i: Int = 0
var inputCols: Array[String] = Array()
for (i <- 0 to rank) {
df_dummy = df_dummy.withColumn("feature".concat(i.toString), rand())
inputCols = inputCols :+ "feature".concat(i.toString)
}
val assembler = new VectorAssembler()
.setInputCols(inputCols)
.setOutputCol("userFeatures")
val output = assembler.transform(df_dummy)
output.select("user", "userFeatures")
}
I then create the DataFrame with new user IDs and add the random vectors and bias:
val usersDf: DataFrame = Seq(567), (678)).toDF("user")
var usersFactorsNew: DataFrame = makeNew(usersDf, 20)
The problem arises when I union the two DataFrames.
usersFactorsNew.union(uF) produces the error:
org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. struct<type:tinyint,size:int,indices:array<int>,values:array<double>> <> array<float> at the second column of the second table;;
If I print the schema, the uF DataFrame has a feature vector of type Array[Float] and the usersFactorsNew DataFrame as a feature vector of type Vector.
My question is how to change the type of the Vector to an Array in order to perform the union.
I tried writing this udf with little success:
val toArr: org.apache.spark.ml.linalg.Vector => Array[Double] = _.toArray
val toArrUdf = udf(toArr)
Perhaps the VectorAssembler is not the best option for this task. However, at the moment, it is the only option I have found. I would love to get some recommendations for something better.
Instead of creating a dummy dataframe and using VectorAssembler to generate a random feature vector, you can simply use an UDF directly. The userFactors from the ALS model will return an Array[Float] so the output from the UDF should match that.
val createRandomArray = udf((rank: Int) => {
Array.fill(rank)(Random.nextFloat)
})
Note that this will give numbers in the interval [0.0, 1.0] (same as the rand() used in the question), if other numbers are required, modify as fit.
Using a rank of 3 and the userDf:
val usersFactorsNew = usersDf.withColumn("userFeatures", createRandomArray(lit(3)))
will give a dataframe as follows (of course with random feature values)
+----+----------------------------------------------------------+
|user|userFeatures |
+----+----------------------------------------------------------+
|567 |[0.6866711267486822,0.7257031656127676,0.983562255688249] |
|678 |[0.7013908820314967,0.41029552817665327,0.554591149586789]|
+----+----------------------------------------------------------+
Joining this dataframe with the uF dataframe should now be possible.
The reason the UDF did not work should be due to it being an Array[Double] while you need anArray[Float]for theunion. It should be possible to fix with amap(_.toFloat)`.
val toArr: org.apache.spark.ml.linalg.Vector => Array[Float] = _.toArray.map(_.toFloat)
val toArrUdf = udf(toArr)
All of your process are all correct. Even the udf function is working successfully. All you need to do is change the last part of makeNew function as
def makeNew(df: DataFrame, rank: Int): DataFrame = {
var df_dummy = df
var i: Int = 0
var inputCols: Array[String] = Array()
for (i <- 0 to rank) {
df_dummy = df_dummy.withColumn("feature".concat(i.toString), rand())
inputCols = inputCols :+ "feature".concat(i.toString)
}
val assembler = new VectorAssembler()
.setInputCols(inputCols)
.setOutputCol("userFeatures")
val output = assembler.transform(df_dummy)
output.select(col("id"), toArrUdf(col("userFeatures")).as("features"))
}
and you should be perfect to go so that when you do (I created userDf with id column and not user column)
val usersDf: DataFrame = Seq((567), (678)).toDF("id")
var usersFactorsNew: DataFrame = makeNew(usersDf, 20)
usersFactorsNew.union(uF).show(false)
you should be getting
+---+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|id |features |
+---+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|567|[0.8259185719733708, 0.327713892339658, 0.049547223031371046, 0.056661808506210054, 0.5846626163454274, 0.038497936270104005, 0.8970865088803417, 0.8840660648882804, 0.837866669938156, 0.9395263094918058, 0.09179528484355126, 0.4915430644129799, 0.11083447052043116, 0.5122858182953718, 0.4302683812966408, 0.3862741815833828, 0.6189322403095068, 0.3000371006293433, 0.09331299668168902, 0.7421838728601371, 0.855867963988993]|
|678|[0.7686514248005568, 0.5473580740023187, 0.072945344124282, 0.36648594574355287, 0.9780202082328863, 0.5289221651923784, 0.3719451099963028, 0.2824660794505932, 0.4873197501260199, 0.9364676464120849, 0.011539929543513794, 0.5240615794930654, 0.6282546154521298, 0.995256022569878, 0.6659179561266975, 0.8990775317754092, 0.08650071017556926, 0.5190186149992805, 0.056345335742325475, 0.6465357505620791, 0.17913532817943245] |
|123|[0.04177388548851013, 0.26762014627456665, -0.19617630541324615, 0.34298020601272583, 0.19632814824581146, -0.2748605012893677, 0.07724890112876892, 0.4277132749557495, 0.1927199512720108, -0.40271613001823425] |
|234|[0.04139673709869385, 0.26520395278930664, -0.19440513849258423, 0.3398836553096771, 0.1945556253194809, -0.27237895131111145, 0.07655145972967148, 0.42385169863700867, 0.19098000228405, -0.39908021688461304] |
+---+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Using Array.Count and match cases F#

I am not sure yet what the problem is, I am trying to go through a ResizeArray and matching the item with the data type, and depending on this, take away the value in a specific field (iSpace) from thespace(which is how much space the inventory has), before returning the final value.
A snippet of my code :
let spaceleft =
let mutable count = 0 //used to store the index to get item from array
let mutable thespace = 60 //the space left in the inventory
printf "Count: %i \n" inventory.Count //creates an error
while count < inventory.Count do
let item = inventory.[count]
match item with
|Weapon weapon ->
thespace <- (thespace - weapon.iSpace)
|Bomb bomb ->
thespace <-(thespace - bomb.iSpace)
|Potion pot ->
thespace <- (thespace - pot.iSpace)
|Armour arm ->
thespace <- (thespace - arm.iSpace)
count <- count+1
thespace
I get an error about Int32, that has to do with the
printf "Count: %i \n" inventory.Count
line
Another problem is that thespace doesn't seem to change, and always returns as 60, although I have checked and inventory is not empty, it always has at least two items, 1 weapon and 1 armour, so thespace should atleast decrease yet it never does.
Other snippets that may help:
let inventory = ResizeArray[]
let initialise =
let mutable listr = roominit
let mutable curroom = 3
let mutable dead = false
inventory.Add(Weapon weap1)
inventory.Add(Armour a1)
let spacetogo = spaceleft //returns 60, although it should not
Also, apart from the iniitialise function, other functions seem not to be able to add items to the inventory properly, eg:
let ok, input = Int32.TryParse(Console.ReadLine())
match ok with
|false ->
printf "The weapon was left here \n"
complete <- false
|true ->
if input = 1 && spaceleft>= a.iSpace then
inventory.Add(Weapon a)
printf "\n %s added to the inventory \n" a.name
complete <- true
else
printf "\n The weapon was left here \n"
complete <- false
complete
You have spaceLeft as a constant value. To make it a function you need to add unit () as a parameter. Here's that change including a modification to make it much simpler (I've included my dummy types):
type X = { iSpace : int }
type Item = Weapon of X | Bomb of X | Potion of X | Armour of X
let inventory = ResizeArray [ Weapon {iSpace = 2}; Bomb {iSpace = 3} ]
let spaceleft () =
let mutable thespace = 60 //the space left in the inventory
printf "Count: %i \n" inventory.Count
for item in inventory do
let itemSpace =
match item with
| Weapon w -> w.iSpace
| Bomb b -> b.iSpace
| Potion p -> p.iSpace
| Armour a -> a.iSpace
thespace <- thespace - itemSpace
thespace
spaceleft () // 55
The above code is quite imperative. If you want to make it more functional (and simpler still) you can use Seq.sumBy:
let spaceleft_functional () =
printf "Count: %i \n" inventory.Count
let spaceUsed =
inventory
|> Seq.sumBy (function
| Weapon w -> w.iSpace
| Bomb b -> b.iSpace
| Potion p -> p.iSpace
| Armour a -> a.iSpace)
60 - spaceUsed
Just adding to the accepted answer: you can also match against record labels, as long as your inner types are records. Combine with an intrinsic type extension on the outer DU:
type X = { iSpace : int }
type Y = { iSpace : int }
type Item = Weapon of X | Bomb of Y | Potion of X | Armour of X
let inventory = ResizeArray [ Weapon {iSpace = 2}; Bomb {iSpace = 3} ]
let itemSpace = function
| Weapon { iSpace = s } | Bomb { iSpace = s }
| Potion { iSpace = s } | Armour { iSpace = s } -> s
type Item with static member (+) (a, b) = a + itemSpace b
60 - (Seq.fold (+) 0 inventory)
// val it : int = 55
Otherwise, you could resort to member constraint invocation expressions.
let inline space (x : ^t) = (^t : (member iSpace : int) (x))

Array of size n to array n x 2 using Java 8 stream

I am trying to figure out the most elegant way of converting a simple int array (e.g. {1, 2, 3}) to a simple String array (e.g. {"id", "1", "id", "2", "id", "3"}) of String pairs using Java 8 streams.
Traditionally the code looks like this: -
int[] input = {1, 2, 3};
String[] output = new String[input.length * 2];
int i = 0;
for (int val : input) {
output[i++] = "id";
output[i++] = String.valueOf(val);
}
But assuming this can be done in a 1-liner in Java 8.
String[] result = Arrays.stream(input)
.mapToObj(x -> new String[] { "id", "" + x })
.flatMap(Arrays::stream)
.toArray(String[]::new);
Or may be a bit more verbose (but worse since we are first joining, only to split immediately after)
String[] result = Arrays.stream(input)
.mapToObj(x -> "id" + "," + x)
.collect(Collectors.joining(","))
.split(",");
I can think of these two, but it's hardly more readable of what you already have in place with a simple for loop.
Can make it even less readable than Eugene's solution:
String[] output = IntStream.range(0, input.length * 2)
.mapToObj(x -> x % 2 == 0 ? "id" : input[x / 2 ] + "")
.toArray(String[]::new);
And another variation of this can be next:
String[] result = Arrays.stream( input )
.boxed()
.flatMap( x -> Stream.of( "id", Integer.toString( x ) ) )
.toArray( String[]::new );

Auto-generated static field in F# class/module prevents creation of SQL CLR function

I'm trying to write a CLR user-defined function in F#, but CREATE ASSEMBLY gives the error:
CREATE ASSEMBLY failed because type 'StringMetrics' in safe assembly 'MyNamespace.SqlServer.Text' has a static field 'field1776#'. Attributes of static fields in safe assemblies must be marked readonly in Visual C#, ReadOnly in Visual Basic, or initonly in Visual C++ and intermediate language.
Here's how it looks in Reflector. This is not a field I've explicitly created.
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
internal static <PrivateImplementationDetails$MyNamespace-SqlServer-Text>.T1775_18Bytes# field1776#; // data size: 18 bytes
I've tried using a module and a class. Both generate the field, just in different places. What is this field for? Is there a way to avoid its creation? Is there another approach I should be using to create a CLR function in F#? Is it even possible?
Complete Code
namespace MyNamespace.SqlServer.Text
module StringMetrics =
open System
open System.Collections.Generic
open System.Data
open System.Data.SqlTypes
[<CompiledName("FuzzyMatch")>]
let fuzzyMatch (strA:SqlString) (strB:SqlString) =
if strA.IsNull || strB.IsNull then SqlInt32.Zero
else
let comparer = StringComparer.OrdinalIgnoreCase
let wordBoundaries = [|' '; '\t'; '\n'; '\r'; ','; ':'; ';'; '('; ')'|]
let stringEquals a b = comparer.Equals(a, b)
let isSubstring (search:string) (find:string) = find.Length >= search.Length / 2 && search.IndexOf(find, StringComparison.OrdinalIgnoreCase) >= 0
let split (str:string) = str.Split(wordBoundaries)
let score (strA:string) (strB:string) =
if stringEquals strA strB then strA.Length * 3
else
let lenA, lenB = strA.Length, strB.Length
if strA |> isSubstring strB then lenA * 2
elif strB |> isSubstring strA then lenB * 2
else 0
let arrA, arrB = split strA.Value, split strB.Value
let dictA, dictB = Dictionary(), Dictionary()
arrA |> Seq.iteri (fun i a ->
arrB |> Seq.iteri (fun j b ->
match score a b with
| 0 -> ()
| s ->
match dictB.TryGetValue(j) with
| true, (s', i') -> //'
if s > s' then //'
dictA.Add(i, j)
dictB.[j] <- (s, i)
| _ ->
dictA.Add(i, j)
dictB.Add(j, (s, i))))
let matchScore = dictB |> Seq.sumBy (function (KeyValue(_, (s, _))) -> s)
let nonMatchA =
arrA
|> Seq.mapi (fun i a -> i, a)
|> Seq.fold (fun s (i, a) ->
if dictA.ContainsKey(i) then s
else s + a.Length) 0
let wordsB = HashSet(seq { for (KeyValue(i, _)) in dictB -> arrB.[i] }, comparer)
let nonMatchB =
arrB |> Seq.fold (fun s b ->
if wordsB.Add(b) then s + b.Length
else s) 0
SqlInt32(matchScore - nonMatchA - nonMatchB)
It seems it's generated by the wordBoundaries array. If you express it as a list instead, and convert it to an array at runtime, this internal static field is not generated:
let wordBoundaries = Array.ofList [' '; '\t'; '\n'; '\r'; ','; ':'; ';'; '('; ')']
However, it seems that also the function itself is represented as a static, non-readonly
field:
public static SqlInt32 fuzzyMatch(SqlString strA, SqlString strB);
Maybe using a class instead of a module cures this.

Resources