How to write "in" query for Erlang mnesia?

How to write "in" query for Erlang mnesia? - database

I have a mnesia table, lets say employee. I need to find all employee records whose name is in EmployeeNameList = ["Erlich", "Richard", "Gilfoyle", "Dinesh"]. Is there a way to do this using mnesia:select or other function?

Following the documentation of Mnesia
It can be done as follows:
get_employees_by_name(NameList) ->
MatchHead = #employee{name = '$1', _ = '_'},
Result = '$_'
MatchSpec = [ { MatchHead, [{'=:=', '$1', Name}], [Result]} || Name <- NameList ],
F = fun() ->
mnesia:select(employee, MatchSpec)
end,
{atomic, Result} = mnesia:transaction(F),
Result.

Related

Accessing instance variables inside an array

I am trying to access a specific value inside an array. The array contains specific class instance variables and is as follows:
[[#<Supermarket:0x007f8e989daef8 #id=1, #name="Easybuy">,
#<Delivery:0x007f8e989f98a8 #type=:standard, #price=5.0>],
[#<Supermarket:0x007f8e99039f88 #id=2, #name="Walmart">,
#<Delivery:0x007f8e989f98a8 #type=:standard, #price=5.0>],
[#<Supermarket:0x007f8e9901a390 #id=3, #name="Forragers">,
#<Delivery:0x007f8e989eae20 #type=:express, #price=10.0>]]
I want to iterate over each array inside the array and find out how many Delivery's within the array have #type:standard. Is this possible? Thank you in advance

array_of_array.inject(0) do |sum, array|
sum + array.count { |el| el.class == Delivery && el.instance_variable_get(:#type) == :standard }
end

You can use select() to filter the elements of an array.
Reconstructing your data:
require 'ostruct'
require 'pp'
supermarket_data = [
['Easybuy', 1],
['Walmart', 2],
['Forragers', 3],
]
supermarkets = supermarket_data.map do |(name, id)|
supermarket = OpenStruct.new
supermarket.name = name
supermarket.id = id
supermarket
end
delivery_data = [
['standard', 5.0],
['standard', 5.0],
['express', 10.0],
]
deliveries = delivery_data.map do |(type, price)|
delivery = OpenStruct.new
delivery.type = type
delivery.price = price
delivery
end
combined = supermarkets.zip deliveries
pp combined
[[#<OpenStruct name="Easybuy", id=1>,
#<OpenStruct type="standard", price=5.0>],
[#<OpenStruct name="Walmart", id=2>,
#<OpenStruct type="standard", price=5.0>],
[#<OpenStruct name="Forragers", id=3>,
#<OpenStruct type="express", price=10.0>]]
Filtering the array with select():
standard_deliveries = combined.select do |(supermarket, delivery)|
delivery.type == 'standard'
end
pp standard_deliveries # pretty print
p standard_deliveries.count
[[#<OpenStruct name="Easybuy", id=1>,
#<OpenStruct type="standard", price=5.0>],
[#<OpenStruct name="Walmart", id=2>,
#<OpenStruct type="standard", price=5.0>]]
2

Speeding up a pattern matching algorithm in scala on a big csv file

I'm currently trying to filter a large database using scala. I've written a simple piece of code to match an ID in one database to a list of ID's in another.
Essentially I want to go through database A and if the ID number in the ID column matches one from database B, to extract that entry from Database A.
The code i've written works fine, but it's slow (i.e. has to run over a couple of days) and i'm trying to find a way to speed it up. It may be that it can't be sped up by much, or it can be much much faster with better coding.
So any help would be much appreciated.
Below is a description of the databases and a copy of the code.
Database A is approximately 10gb in size with over 100 million entries and database B has a list of approx 50,000 IDs.
Each database looks like as follows:
Database A:
ID, DataX, date
10, 100,01012000
15, 20, 01012008
5, 32, 01012006
etc...
Database B:
ID
10
15
12
etc...
My code is as follows:
import scala.io.Source
import java.io._
object filter extends App {
def ext[T <: Closeable, R](resource: T)(block: T => R): R = {
try { block(resource) }
finally { resource.close() }
}
val key = io.Source.fromFile("C:\\~Database_B.csv").getLines()
val key2 = new Array[String](50000)
key.copyToArray(key2)
ext(new BufferedWriter(new OutputStreamWriter(new FileOutputStream("C:\\~Output.csv")))) {
writer =>
val line = io.Source.fromFile("C:\\~Database_A.csv").getLines.drop(1)
while (line.hasNext) {
val data= line.next
val array = data.split(",").map(_.trim)
val idA = array(0)
val dataX = array(1)
val date = array(2)
key2.map { idB =>
if (idA == idB) {
val print = (idA + "," + dataX + "," + date)
writer.write(print)
writer.newLine()
} else None
}
}
}
}

First, there are way more efficient ways to do that than writing a Scala program. Loading two tables in a database and do a join will take about 10 minutes (including data loading) on a modern computer.
Assuming you have to use scala, there is an obvious improvement. Store you keys as a HashSet and use keys.contains(x) instead of traversing all keys. This would give you O(1) lookup instead of O(N) that you have now, which should speed up your program significantly.
Minor point -- use string interpolation instead of concatenation, i.e.
s"$idA,$dataX,$date"
// instead of
idA + "," + dataX + "," + date

Try this:
import scala.io.Source
import java.io._
object filter extends App {
def ext[T <: Closeable, R](resource: T)(block: T => R): R = {
try { block(resource) }
finally { resource.close() }
}
// convert to a Set
val key2 = io.Source.fromFile("C:\\~Database_B.csv").getLines().toSet
ext(new BufferedWriter(new OutputStreamWriter(new FileOutputStream("C:\\~Output.csv")))) {
writer =>
val lines = io.Source.fromFile("C:\\~Database_A.csv").getLines.drop(1)
for (data <- lines) {
val array = data.split(",").map(_.trim)
array match {
case Array(idA, dataX, date) =>
if (key2.contains(idA)) {
val print = (idA + "," + dataX + "," + date)
writer.write(print)
writer.newLine()
}
case _ => // invalid input
}
}
}
}
IDs are now stored in a set. This will give a better performance.

How to use a self made Type in F#?

I made a type, but I don't know how to use it properly and I don't found any solution on google.
type Sample =
{
TrackPosition : int
TubePosition : int
Barcode : string
}
let arraySamples = Array.create Scenario.Samples.NumberOfSamples **Sample**
BarcodeGenerieren.Samples.Sample
let mutable trackPosition = Scenario.Samples.StartTrackPositions
let mutable index = 1
for i in 1 .. Scenario.Samples.NumberOfSamples do
let randomNumber = System.Random().Next(0,9999)
if index > 24 then
trackPosition <- trackPosition + 1
index <- 1
arraySamples.[index] <- **new Sample{TrackPosition= trackPosition, TubePosition = index, Barcode = sprintf "100%s%06d" ((trackPosition + 1) - Scenario.Samples.StartTrackPositions) randomNumber}**
So my question is, what should I changed so that it works, when I will give the type of the array and when I will give the sample with data to the array?

You have created what is referred to as a record type. You can initialise it with the following syntax
{TrackPosition = 0;TubePosition = 0;Barcode = "string"}
your syntax in the last line is almost correct - it should be
arraySamples.[index] <- Sample{
TrackPosition= trackPosition;
TubePosition = index;
Barcode = sprintf "100%s%06d" ((trackPosition + 1) - Scenario.Samples.StartTrackPositions) randomNumber}
The changes are
Eliminate new
replace , with ;

coffeescript looping through array and adding values

What I'd like to do is add an array of students to each manager (in an array).
This is where I'm getting stuck:
for sup in sups
do(sup) ->
sup.students_a = "This one works"
getStudents sup.CLKEY, (studs) ->
sup.students_b = "This one doesn't"
cback sups
EDIT: After some thought, what may be happening is that it is adding the "sudents_b" data to the sups array, but the sups array is being returned (via cback function) before this work is performed. Thus, I suppose I should move that work to a function and only return sups after another callback is performed?
For context, here's the gist of this code:
odbc = require "odbc"
module.exports.run = (managerId, cback) ->
db2 = new odbc.Database()
conn = "dsn=mydsn;uid=myuid;pwd=mypwd;database=mydb"
db2.open conn, (err) ->
throw err if err
sortBy = (key, a, b, r) ->
r = if r then 1 else -1
return -1*r if a[key] > b[key]
return +1*r if b[key] > a[key]
return 0
getDB2Rows = (sql, params, cb) ->
db2.query sql, params, (err, rows, def) ->
if err? then console.log err else cb rows
getManagers = (mid, callback) ->
supers = []
queue = []
querySupers = (id, cb) ->
sql = "select distinct mycolumns where users.id = ? and users.issupervisor = 1"
getDB2Rows sql, [id], (rows) ->
for row in rows
do(row) ->
if supers.indexOf row is -1 then supers.push row
if queue.indexOf row is -1 then queue.push row
cb null
addSupers = (id) -> # todo: add limit to protect against infinate loop
querySupers id, (done) ->
shiftrow = queue.shift()
if shiftrow? and shiftrow['CLKEY']? then addSupers shiftrow['CLKEY'] else
callback supers
addMain = (id) ->
sql = "select mycolumns where users.id = ? and users.issupervisor = 1"
getDB2Rows sql, [id], (rows) ->
supers.push row for row in rows
addMain mid
addSupers mid
getStudents = (sid, callb) ->
students = []
sql = "select mycols from mytables where myconditions and users.supervisor = ?"
getDB2Rows sql, [sid], (datas) ->
students.push data for data in datas
callb students
console.log "Compiling Array of all Managers tied to ID #{managerId}..."
getManagers managerId, (sups) ->
console.log "Built array of #{sups.length} managers"
sups.sort (a,b) ->
sortBy('MLNAME', a, b) or # manager's manager
sortBy('LNAME', a, b) # manager
for sup in sups
do(sup) ->
sup.students_a = "This one works"
getStudents sup.CLKEY, (studs) ->
sup.students_b = "This one doesn't"
cback sups

You are correct that your callback cback subs is executed before even the first getStudents has executed it's callback with the studs array. Since you want to do this for a whole array, it can grow a little hairy with just a for loop.
I always recommend async for these things:
getter = (sup, callback) ->
getStudents sup.CLKEY, callback
async.map sups, getter, (err, results) ->
// results is an array of results for each sup
callback() // <-- this is where you do your final callback.
Edit: Or if you want to put students on each sup, you would have this getter:
getter = (sup, callback) ->
getStudents sup.CLKEY, (studs) ->
sup.students = studs
// async expects err as the first parameter to callbacks, as is customary in node
callback null, sup
Edit: Also, you should probably follow the node custom of passing err as the first argument to all callbacks, and do proper error checking.

Is it better to change the db schema?

I'm building a web app with django. I use postgresql for the db. The app code is getting really messy(my begginer skills being a big factor) and slow, even when I run the app locally.
This is an excerpt of my models.py file:
REPEATS_CHOICES = (
(NEVER, 'Never'),
(DAILY, 'Daily'),
(WEEKLY, 'Weekly'),
(MONTHLY, 'Monthly'),
...some more...
)
class Transaction(models.Model):
name = models.CharField(max_length=30)
type = models.IntegerField(max_length=1, choices=TYPE_CHOICES) # 0 = 'Income' , 1 = 'Expense'
amount = models.DecimalField(max_digits=12, decimal_places=2)
date = models.DateField(default=date.today)
frequency = models.IntegerField(max_length=2, choices=REPEATS_CHOICES)
ends = models.DateField(blank=True, null=True)
active = models.BooleanField(default=True)
category = models.ForeignKey(Category, related_name='transactions', blank=True, null=True)
account = models.ForeignKey(Account, related_name='transactions')
The problem is with date, frequency and ends. With this info I can know all the dates in which transactions occurs and use it to fill a cashflow table. Doing things this way involves creating a lot of structures(dictionaries, lists and tuples) and iterating them a lot. Maybe there is a very simple way of solving this with the actual schema, but I couldn't realize how.
I think that the app would be easier to code if, at the creation of a transaction, I could save all the dates in the db. I don't know if it's possible or if it's a good idea.
I'm reading a book about google app engine and the datastore's multivalued properties. What do you think about this for solving my problem?.
Edit: I didn't know about the PickleField. I'm now reading about it, maybe I could use it to store all the transaction's datetime objects.
Edit2: This is an excerpt of my cashflow2 view(sorry for the horrible code):
def cashflow2(request, account_name="Initial"):
if account_name == "Initial":
uri = "/cashflow/new_account"
return HttpResponseRedirect(uri)
month_info = {}
cat_info = {}
m_y_list = [] # [(month,year),]
trans = []
min, max = [] , []
account = Account.objects.get(name=account_name, user=request.user)
categories = account.categories.all()
for year in range(2006,2017):
for month in range(1,13):
month_info[(month, year)] = [0, 0, 0]
for cat in categories:
cat_info[(cat, month, year)] = 0
previous_months = 1 # previous months from actual
next_months = 5
dates_list = month_year_list(previous_month, next_months) # Returns [(month,year)] from the requested range
m_y_list = [(date.month, date.year) for date in month_year_list(1,5)]
min, max = dates_list[0], dates_list[-1]
INCOME = 0
EXPENSE = 1
ONHAND = 2
transacs_in_dates = []
txs = account.transactions.order_by('date')
for tx in txs:
monthyear = ()
monthyear = (tx.date.month, tx.date.year)
if tx.frequency == 0:
if tx.type == 0:
month_info[monthyear][INCOME] += tx.amount
if tx.category:
cat_info[(tx.category, monthyear[0], monthyear[1])] += tx.amount
else:
month_info[monthyear][EXPENSE] += tx.amount
if tx.category:
cat_info[(tx.category, monthyear[0], monthyear[1])] += tx.amount
if monthyear in lista_m_a:
if tx not in transacs_in_dates:
transacs_in_dates.append(tx)
elif tx.frequency == 4: # frequency = 'Monthly'
months_dif = relativedelta.relativedelta(tx.ends, tx.date).months
if tx.ends.day < tx.date.day:
months_dif += 1
years_dif = relativedelta.relativedelta(tx.ends, tx.date).years
dif = months_dif + (years_dif*12)
dates_range = dif + 1
for i in range(dates_range):
dt = tx.date+relativedelta.relativedelta(months=+i)
if (dt.month, dt.year) in m_y_list:
if tx not in transacs_in_dates:
transacs_in_dates.append(tx)
if tx.type == 0:
month_info[(fch.month,fch.year)][INCOME] += tx.amount
if tx.category:
cat_info[(tx.category, fch.month, fch.year)] += tx.amount
else:
month_info[(fch.month,fch.year)][EXPENSE] += tx.amount
if tx.category:
cat_info[(tx.category, fch.month, fch.year)] += tx.amount
import operator
thelist = []
thelist = sorted((my + tuple(v) for my, v in month_info.iteritems()),
key = operator.itemgetter(1, 0))
thelistlist = []
for atuple in thelist:
thelistlist.append(list(atuple))
for i in range(len(thelistlist)):
if i != 0:
thelistlist[i][4] = thelistlist[i-1][2] - thelistlist[i-1][3] + thelistlist[i-1][4]
list = []
for el in thelistlist:
if (el[0],el[1]) in lista_m_a:
list.append(el)
transactions = account.transactions.all()
cats_in_dates_income = []
cats_in_dates_expense = []
for t in transacs_in_dates:
if t.category and t.type == 0:
if t.category not in cats_in_dates_income:
cats_in_dates_income.append(t.category)
elif t.category and t.type == 1:
if t.category not in cats_in_dates_expense:
cats_in_dates_expense.append(t.category)
cat_infos = []
for k, v in cat_info.items():
cat_infos.append((k[0], k[1], k[2], v))

Depends on how relevant App Engine is here. P.S. If you'd like to store pickled objects as well as JSON objects in the Google Datastore, check out these two code snippets:
http://kovshenin.com/archives/app-engine-json-objects-google-datastore/
http://kovshenin.com/archives/app-engine-python-objects-in-the-google-datastore/
Also note that the Google Datastore is a non-relational database, so you might have other trouble refactoring your code to switch to that.
Cheers and good luck!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to write "in" query for Erlang mnesia? - database

I have a mnesia table, lets say employee. I need to find all employee records whose name is in EmployeeNameList = ["Erlich", "Richard", "Gilfoyle", "Dinesh"]. Is there a way to do this using mnesia:select or other function?

Related

Accessing instance variables inside an array

Speeding up a pattern matching algorithm in scala on a big csv file

How to use a self made Type in F#?

coffeescript looping through array and adding values

Is it better to change the db schema?

Categories

Resources