I have created a case class like this:
def case_class(): Unit = {
case class StockPrice(quarter : Byte,
stock : String,
date : String,
open : Double,
high : Double,
low : Double,
close : Double,
volume : Double,
percent_change_price : Double,
percent_change_volume_over_last_wk : Double,
previous_weeks_volume : Double,
next_weeks_open : Double,
next_weeks_close : Double,
percent_change_next_weeks_price : Double,
days_to_next_dividend : Double,
percent_return_next_dividend : Double
And I have thousands of line as Array of String like this:
How Can I parse data from Array into that case class?
Thank you for your help!
You can proceed as below (I've taken simplified example)
Given your case class and data (lines)
// Your case-class
case class MyCaseClass(
fieldByte: Byte,
fieldString: String,
fieldDouble: Double
// input data
val lines: List[String] = List(
Note: you can read lines from a text file as
val lines = Source.fromFile("my_file.txt").getLines.toList
You can have some utility methods for mapping (cleaning & parsing)
// remove '$' symbols from string
def removeDollars(line: String): String = line.replaceAll("\\$", "")
// split string into tokens and
// convert into MyCaseClass object
def parseLine(line: String): MyCaseClass = {
val tokens: Seq[String] = line.split(",")
fieldByte = tokens(0).toByte,
fieldString = tokens(1),
fieldDouble = tokens(2).toDouble
And then use them to convert strings into case-class objects
// conversion
val myCaseClassObjects: Seq[MyCaseClass] = lines.map(removeDollars).map(parseLine)
As a more advanced (and generalized) approach, you can generate the mapping (parsing) function for converting tokens into fields of your case-class using something like reflection, as told here
Here's one way of doing it. I'd recommend splitting everything you do up into lots of small, easy-to-manage functions, otherwise you will get lost trying to figure out where something is going wrong if it all starts throwing exceptions. Data setup:
val array = Array("1,AA,1/7/2011,$15.82,$16.72,$15.78,$16.42,239655616,3.79267,,,$16.71,$15.97,-4.42849,26,0.182704",
case class StockPrice(quarter: Byte, stock: String, date: String, open: Double,
high: Double, low: Double, close: Double, volume: Double, percent_change_price: Double,
percent_change_volume_over_last_wk: Double, previous_weeks_volume: Double,
next_weeks_open: Double, next_weeks_close: Double, percent_change_next_weeks_price: Double,
days_to_next_dividend: Double, percent_return_next_dividend: Double
Function to turn Array[String] into Array[List[String]] and handle any empty fields (I've made an assumption here that you want empty fields to be 0. Change this as necessary):
def splitArray(arr: Array[String]): Array[List[String]] = {
_.replaceAll("\\$", "") // Remove $
.split(",") // Split by ,
.map {
case x if x.isEmpty => "0" // If empty
case y => y // If not empty
Function to turn a List[String] into a StockPrice. Note that this will fall over if the List isn't exactly 16 items long. I'll leave you to handle any of that. Also, the names are pretty non-descriptive so you can change that too. It will also fall over if your data doesn't map to the relevant .toDouble or toByte or whatever - you can handle this yourself too:
def toStockPrice: List[String] => StockPrice = {
case a :: b :: c :: d :: e :: f :: g :: h :: i :: j :: k :: l :: m :: n :: o :: p :: Nil =>
StockPrice(a.toByte, b, c, d.toDouble, e.toDouble, f.toDouble, g.toDouble, h.toDouble, i.toDouble, j.toDouble,
k.toDouble, l.toDouble, m.toDouble, n.toDouble, o.toDouble, p.toDouble)
A nice function to bring this all together:
def makeCaseClass(arr: Array[String]): Seq[StockPrice] = {
val splitArr: Array[List[String]] = splitArray(arr)
// StockPrice(1,AA,1/7/2011,15.82,16.72,15.78,16.42,2.39655616E8,3.79267,0.0,0.0,16.71,15.97,-4.42849,26.0,0.182704),
// StockPrice(1,AA,1/14/2011,16.71,16.71,15.64,15.97,2.42963398E8,-4.42849,1.380223028,2.39655616E8,16.19,15.79,-2.47066,19.0,0.187852),
// StockPrice(1,AA,1/21/2011,16.19,16.38,15.6,15.79,1.38428495E8,-2.47066,-43.02495926,2.42963398E8,15.87,16.13,1.63831,12.0,0.189994),
// StockPrice(1,AA,1/28/2011,15.87,16.63,15.82,16.13,1.51379173E8,1.63831,9.355500109,1.38428495E8,16.18,17.14,5.93325,5.0,0.185989)
To explain the a :: b :: c ..... bit - this is a way of assigning names to items in a List or Seq, given you know the List's size.
val ls = List(1, 2, 3)
val a :: b :: c :: Nil = List(1, 2, 3)
println(a == ls.head) // true
println(b == ls(1)) // true
println(c == ls(2)) // true
Note that the Nil is important because it signifies the last element of the List being Nil. Without it, c would be equal to List(3) as the rest of any List is assigned to the last value in your definition.
You can use this in pattern matching as I have in order to do something with the results:
val ls = List(1, "b", true)
ls match {
case a :: b :: c if c == true => println("this will not be printed")
case a :: b :: c :: Nil if c == true => println(s"this will get printed because c == $c")
} // not exhaustive but you get the point
You can also use it if you know what you want each element in the List to be, like this:
val personCharacteristics = List("James", 26, "blue", 6, 85.4, "brown")
val name :: age :: eyeColour :: otherCharacteristics = personCharacteristics
println(s"Name: $name; Age: $age; Eye colour: $eyeColour")
// Name: James; Age: 26; Eye colour: blue
Obviously these examples are pretty trivial and not exactly what you'd see as a professional Scala developer (at least I don't), but it's a handy thing to be aware of as I do still use this :: syntax at work sometimes.
Could anyone tell me why I'm getting the error type: AttributeError: 'builtin_function_or_method' object has no attribute
'size' in like 57?
for this synthax: out=np.zeros((x.size,y.size))
import numpy as np
import sympy as sp
from numpy import exp,sqrt,pi
from sympy import Integral, log, exp, sqrt, pi
import math
from numpy import array
import matplotlib.pyplot as plt
import scipy.integrate
from scipy.special import erf
from scipy.stats import norm, gaussian_kde
from quantecon import LAE
from sympy.abc import q
#from sympy import symbols
#q= symbols('q')
## == Define parameters == #
d = (sigma*np.sqrt(2*np.pi))
phi = norm()
n = 500
#Phi(z) = 1/2[1 + erf(z/sqrt(2))].
def p_k_positive(x, y):
# x, y = np.array(x, dtype=float), np.array(y, dtype=float)
Positive_RG = norm.pdf(x[:, None] - y[None, :]+Q1, mu, sigma)
print('Positive_R = ', Positive_RG)
return Positive_RG
def p_k_negative(x, y):
# x, y = np.array(x, dtype=float), np.array(y, dtype=float)
Negative_RG = norm.pdf(x[:, None] - y[None, :]+Q2, mu, sigma)
print('Negative_RG = ', Negative_RG)
return Negative_RG
def p_k_zero(x, y):
# x, y = np.array(x, dtype=float), np.array(y, dtype=float)
Zero_RG = (1/(2*math.sqrt(2*math.pi)))*(erf((x[:, None]+Q2-mu)/(sigma*math.sqrt(2)))-erf((x[:, None]+Q1-mu)/(sigma*math.sqrt(2))))
#Zero_RG =norm.pdf
return Zero_RG
def myFilter(x,y):
x, y = x.squeeze, y.squeeze
xyDiff = x[:, None] - y[None, :]
out=np.where(np.bitwise_and(y[None, :] > 0.0, xyDiff >= -Q1), p_k_positive(x, y), out) # unless the sum functions are different
out=np.where(np.bitwise_and(y[None, :] < 0.0, x[:, None] >= -Q1), p_k_negative(x, y), out)
out=np.where(np.bitwise_and(y[None, :] ==0.0, xyDiff >= -Q1), p_k_zero(x, y), out)
return out
Z = phi.rvs(n)
X = np.empty(n)
for t in range(n-1):
X[t+1] = X[t] + Z[t]
#X[t+1] = np.abs(X[t]) + Z[t]
psi_est = LAE(myFilter, X)
k_est = gaussian_kde(X)
fig, ax = plt.subplots(figsize=(10,7))
ys = np.linspace(-200.0, 200.0, 200)
ax.plot(ys, psi_est(ys), 'g-', lw=2, alpha=0.6, label='look ahead estimate')
ax.plot(ys, k_est(ys), 'k-', lw=2, alpha=0.6, label='kernel based estimate')
ax.legend(loc='upper left')
x, y = x.squeeze, y.squeeze
Should be
x, y = x.squeeze(), y.squeeze()
or you're trying to take the size of a function.
I'm trying to write a CLR user-defined function in F#, but CREATE ASSEMBLY gives the error:
CREATE ASSEMBLY failed because type 'StringMetrics' in safe assembly 'MyNamespace.SqlServer.Text' has a static field 'field1776#'. Attributes of static fields in safe assemblies must be marked readonly in Visual C#, ReadOnly in Visual Basic, or initonly in Visual C++ and intermediate language.
Here's how it looks in Reflector. This is not a field I've explicitly created.
internal static <PrivateImplementationDetails$MyNamespace-SqlServer-Text>.T1775_18Bytes# field1776#; // data size: 18 bytes
I've tried using a module and a class. Both generate the field, just in different places. What is this field for? Is there a way to avoid its creation? Is there another approach I should be using to create a CLR function in F#? Is it even possible?
Complete Code
namespace MyNamespace.SqlServer.Text
module StringMetrics =
open System
open System.Collections.Generic
open System.Data
open System.Data.SqlTypes
let fuzzyMatch (strA:SqlString) (strB:SqlString) =
if strA.IsNull || strB.IsNull then SqlInt32.Zero
let comparer = StringComparer.OrdinalIgnoreCase
let wordBoundaries = [|' '; '\t'; '\n'; '\r'; ','; ':'; ';'; '('; ')'|]
let stringEquals a b = comparer.Equals(a, b)
let isSubstring (search:string) (find:string) = find.Length >= search.Length / 2 && search.IndexOf(find, StringComparison.OrdinalIgnoreCase) >= 0
let split (str:string) = str.Split(wordBoundaries)
let score (strA:string) (strB:string) =
if stringEquals strA strB then strA.Length * 3
let lenA, lenB = strA.Length, strB.Length
if strA |> isSubstring strB then lenA * 2
elif strB |> isSubstring strA then lenB * 2
else 0
let arrA, arrB = split strA.Value, split strB.Value
let dictA, dictB = Dictionary(), Dictionary()
arrA |> Seq.iteri (fun i a ->
arrB |> Seq.iteri (fun j b ->
match score a b with
| 0 -> ()
| s ->
match dictB.TryGetValue(j) with
| true, (s', i') -> //'
if s > s' then //'
dictA.Add(i, j)
dictB.[j] <- (s, i)
| _ ->
dictA.Add(i, j)
dictB.Add(j, (s, i))))
let matchScore = dictB |> Seq.sumBy (function (KeyValue(_, (s, _))) -> s)
let nonMatchA =
|> Seq.mapi (fun i a -> i, a)
|> Seq.fold (fun s (i, a) ->
if dictA.ContainsKey(i) then s
else s + a.Length) 0
let wordsB = HashSet(seq { for (KeyValue(i, _)) in dictB -> arrB.[i] }, comparer)
let nonMatchB =
arrB |> Seq.fold (fun s b ->
if wordsB.Add(b) then s + b.Length
else s) 0
SqlInt32(matchScore - nonMatchA - nonMatchB)
It seems it's generated by the wordBoundaries array. If you express it as a list instead, and convert it to an array at runtime, this internal static field is not generated:
let wordBoundaries = Array.ofList [' '; '\t'; '\n'; '\r'; ','; ':'; ';'; '('; ')']
However, it seems that also the function itself is represented as a static, non-readonly
public static SqlInt32 fuzzyMatch(SqlString strA, SqlString strB);
Maybe using a class instead of a module cures this.