How to limit amount of associations in Elixir Ecto

How to limit amount of associations in Elixir Ecto - database

I have this app where there is a Games table and a Players table, and they share an n:n association.
This association is mapped in Phoenix through a GamesPlayers schema.
What I'm wondering how to do is actually quite simple: I'd like there to be an adjustable limit of how many players are allowed per game.
If you need more details, carry on reading, but if you already know an answer feel free to skip the rest!
What I've Tried
I've taken a look at adding check constraints, but without much success. Here's what the check constraint would have to look something like:
create constraint("games_players", :limit_players, check: "count(players) <= player_limit")
Problem here is, the check syntax is very much invalid and I don't think there actually is a valid way to achieve this using this call.
I've also looked into adding a trigger to the Postgres database directly in order to enforce this (something very similar to what this answer proposes), but I am very wary of directly fiddling with the DB since I should only be using ecto's interface.
Table Schemas
For the purposes of this question, let's assume this is what the tables look like:
Games
Property
Type
id
integer
player_limit
integer
Players
Property
Type
id
integer
GamesPlayers
Property
Type
game_id
references(Games)
player_id
references(Players)

As I mentioned in my comment, I think the cleanest way to enforce this is via business logic inside the code, not via a database constraint. I would approach this using a database transaction, which Ecto supports via Ecto.Repo.transaction/2. This will prevent any race conditions.
In this case I would do something like the following:
begin the transaction
perform a SELECT query counting the number of players in the given game; if the game is already full, abort the transaction, otherwise, continue
perform an INSERT query to add the player to the game
complete the transaction
In code, this would boil down to something like this (untested):
import Ecto.Query
alias MyApp.Repo
alias MyApp.GamesPlayers
#max_allowed_players 10
def add_player_to_game(player_id, game_id, opts \\ []) do
max_allowed_players = Keyword.get(opts, :max_allowed_players, #max_allowed_players)
case is_game_full?(game_id, max_allowed_players) do
false -> %GamesPlayers{
game_id: game_id,
player_id: player_id
}
|> Repo.insert!()
# Raising an error causes the transaction to fail
true -> raise "Game #{inspect(game_id)} full; cannot add player #{inspect(player_id)}"
end
end
defp is_game_full?(game_id, max_allowed_players) do
current_players = from(r in GamesPlayers,
where: r.game_id == game_id,
select: count(r.id)
)
|> Repo.one()
current_players >= max_allowed_players
end

Related

How much trust should I put in the validity of retrieved data from database?

Other way to ask my question is: "Should I keep the data types coming from database as simple and raw as I would ask them from my REST endpoint"
Imagine this case class that I want to store in the database as a row:
case class Product(id: UUID,name: String, price: BigInt)
It clearly isn't and shouldn't be what it says it is because The type signatures of nameand price are a lie.
so what we do is create custom data types that better represent what things are such as: (For the sake of simplicity imagine our only concern is the price data type)
case class Price(value: BigInt) {
require(value > BigInt(0))
}
object Price {
def validate(amount: BigInt): Either[String,Price] =
Try(Price(amount)).toOption.toRight("invalid.price")
}
//As a result my Product class is now:
case class Product(id: UUID,name: String,price: Price)
So now the process of taking user input for product data would look like this:
//this class would be parsed from i.e a form:
case class ProductInputData(name: String, price: BigInt)
def create(input: ProductInputData) = {
for {
validPrice <- Price.validate(input.price)
} yield productsRepo.insert(
Product(id = UUID.randomUUID,name = input.name,price = ???)
)
}
look at the triple question marks (???). this is my main point of concern from an entire application architecture perspective; If I had the ability to store a column as Price in the database (for example slick supports these custom data types) then that means I have the option to store the price as either price : BigInt = validPrice.value or price: Price = validPrice.
I see so many pros and cons in both of these decisions and I can't decide.
here are the arguments that I see supporting each choice:
Store data as simple database types (i.e. BigInt) because:
performance: simple assertion of x > 0 on the creation of Price is trivial but imagine you want to validate a Custom Email type with a complex regex. it would be detrimental upon retrieval of collections
Tolerance against Corruption: If BigInt is inserted as negative value it would't explode in your face every time your application tried to simply read the column and throw it out on to the user interface. It would however cause problem if it got retrieved and then involved in some domain layer processing such as purchase.
Store data as it's domain rich type (i.e. Price) because:
No implicit reasoning and trust: Other method some place else in the system would need the price to be valid. For example:
//two terrible variations of a calculateDiscount method:
//this version simply trusts that price is already valid and came from db:
def calculateDiscount(price: BigInt): BigInt = {
//apply some positive coefficient to price and hopefully get a positive
//number from it and if it's not positive because price is not positive then
//it'll explode in your face.
}
//this version is even worse. It does retain function totality and purity
//but the unforgivable culture it encourages is the kind of defensive and
//pranoid programming that causes every developer to write some guard
//expressions performing duplicated validation All over!
def calculateDiscount(price: BigInt): Option[BigInt] = {
if (price <= BigInt(0))
None
else
Some{
//Do safe processing
}
}
//ideally you want it to look like this:
def calculateDiscount(price: Price): Price
No Constant conversion of domain types to simple types and vice versa: for representation, storage,domain layer and such; you simply have one representation in the system to rule them all.
The source of all this mess that I see is the database. if data was coming from the user it'd be easy: You simply never trust it to be valid. you ask for simple data types cast them to domain types with validation and then proceed. But not the db. Does the modern layered architecture address this issue in some definitive or at least mitigating way?

Protect the integrity of the database. Just as you would protect the integrity of the internal state of an object.
Trust the database. It doesn't make sense to check and re-check what has already been checked going in.
Use domain objects for as long as you can. Wait till the very last moment to give them up (raw JDBC code or right before the data is rendered).
Don't tolerate corrupt data. If the data is corrupt, the application should crash. Otherwise it's likely to produce more corrupt data.
The overhead of the require call when retrieving from the DB is negligible. If you really think it's an issue, provide 2 constructors, one for the data coming from the user (performs validation) and one that assumes the data is good (meant to be used by the database code).
I love exceptions when they point to a bug (data corruption because of insufficient validation on the way in).
That said, I regularly leave requires in code to help catch bugs in more complex validation (maybe data coming from multiple tables combined in some invalid way). The system still crashes (as it should), but I get a better error message.

What Erlang data structure to use for ordered set with the possibility to do lookups?

I am working on a problem where I need to remember the order of events I receive but also I need to lookup the event based on it's id. How can I do this efficiently in Erlang if possible without a third party library? Note that I have many potentially ephemeral actors with each their own events (already considered mnesia but it requires atoms for the tables and the tables would stick around if my actor died).
-record(event, {id, timestamp, type, data}).

Based on the details included in the discussion in comments on Michael's answer, a very simple, workable approach would be to create a tuple in your process state variable that stores the order of events separately from the K-V store of events.
Consider:
%%% Some type definitions so we know exactly what we're dealing with.
-type id() :: term().
-type type() :: atom().
-type data() :: term().
-type ts() :: calendar:datetime().
-type event() :: {id(), ts(), type(), data()}.
-type events() :: dict:dict(id(), {type(), data(), ts()}).
% State record for the process.
% Should include whatever else the process deals with.
-record(s,
{log :: [id()],
events :: event_store()}).
%%% Interface functions we will expose over this module.
-spec lookup(pid(), id()) -> {ok, event()} | error.
lookup(Pid, ID) ->
gen_server:call(Pid, {lookup, ID}).
-spec latest(pid()) -> {ok, event()} | error.
latest(Pid) ->
gen_server:call(Pid, get_latest).
-spec notify(pid(), event()) -> ok.
notify(Pid, Event) ->
gen_server:cast(Pid, {new, Event}).
%%% gen_server handlers
handle_call({lookup, ID}, State#s{events = Events}) ->
Result = find(ID, Events),
{reply, Result, State};
handle_call(get_latest, State#s{log = [Last | _], events = Events}) ->
Result = find(Last, Events),
{reply, Result, State};
% ... and so on...
handle_cast({new, Event}, State) ->
{ok, NewState} = catalog(Event, State),
{noreply, NewState};
% ...
%%% Implementation functions
find(ID, Events) ->
case dict:find(ID, Events) of
{Type, Data, Timestamp} -> {ok, {ID, Timestamp, Type, Data}};
Error -> Error
end.
catalog({ID, Timestamp, Type, Data},
State#s{log = Log, events = Events}) ->
NewEvents = dict:store(ID, {Type, Data, Timestamp}, Events),
NewLog = [ID | Log],
{ok, State#s{log = NewLog, events = NewEvents}}.
This is a completely straightforward implementation and hides the details of the data structure behind the interface of the process. Why did I pick a dict? Just because (its easy). Without knowing your requirements better I really have no reason to pick a dict over a map over a gb_tree, etc. If you have relatively small data (hundreds or thousands of things to store) the performance isn't usually noticeably different among these structures.
The important thing is that you clearly identify what messages this process should respond to and then force yourself to stick to it elsewhere in your project code by creating an interface of exposed functions over this module. Behind that you can swap out the dict for something else. If you really only need the latest event ID and won't ever need to pull the Nth event from the sequence log then you could ditch the log and just keep the last event's ID in the record instead of a list.
So get something very simple like this working first, then determine if it actually suits your need. If it doesn't then tweak it. If this works for now, just run with it -- don't obsess over performance or storage (until you are really forced to).
If you find later on that you have a performance problem switch out the dict and list for something else -- maybe gb_tree or orddict or ETS or whatever. The point is to get something working right now so you have a base from which to evaluate the functionality and run benchmarks if necessary. (The vast majority of the time, though, I find that whatever I start out with as a specced prototype turns out to be very close to whatever the final solution will be.)

Your question makes it clear you want to lookup by ID, but it's not entirely clear if you want to lookup or traverse your data by or based on time, and what operations you might want to perform in that regard; you say "remember the order of events" but storing your records with an index of the ID field will accomplish that.
If you only have to lookup by ID then any of the usual suspects will work as a suitable storage engines, so ets, gb_trees and dict for example would be good. Don't use mnesia unless you need the transactions and safety and all those good features; mnesia is good, but there is a high performance price to be paid for all that stuff, and it's not clear you need it, from your question anyway.
If you do want to lookup or traverse your data by or based on time, then consider an ets table of ordered_set. If that can do what you need then it's probably a good choice. In that case you would employ two tables, one set to provide a hash lookup by ID and another ordered_set to lookup or traverse by timestamp.
If you have two different lookup methods like this there's no getting around the fact you need two indexes. You could store the whole record in both, or, assuming your IDs are unique, you could store the ID as the data in the ordered_set. Which you choose is really a matter of trade off of storage utilisation and read and wrote performance.

Django Query Optimisation

I am working currently on telecom analytics project and newbie in query optimisation. To show result in browser it takes a full minute while just 45,000 records are to be accessed. Could you please suggest on ways to reduce time for showing results.
I wrote following query to find call-duration of a person of age-group:
sigma=0
popn=len(Demo.objects.filter(age_group=age))
card_list=[Demo.objects.filter(age_group=age)[i].card_no
for i in range(popn)]
for card in card_list:
dic=Fact_table.objects.filter(card_no=card.aggregate(Sum('duration'))
sigma+=dic['duration__sum']
avgDur=sigma/popn
Above code is within for loop to iterate over age-groups.
Model is as follows:
class Demo(models.Model):
card_no=models.CharField(max_length=20,primary_key=True)
gender=models.IntegerField()
age=models.IntegerField()
age_group=models.IntegerField()
class Fact_table(models.Model):
pri_key=models.BigIntegerField(primary_key=True)
card_no=models.CharField(max_length=20)
duration=models.IntegerField()
time_8bit=models.CharField(max_length=8)
time_of_day=models.IntegerField()
isBusinessHr=models.IntegerField()
Day_of_week=models.IntegerField()
Day=models.IntegerField()
Thanks

Try that:
sigma=0
demo_by_age = Demo.objects.filter(age_group=age);
popn=demo_by_age.count() #One
card_list = demo_by_age.values_list('card_no', flat=True) # Two
dic = Fact_table.objects.filter(card_no__in=card_list).aggregate(Sum('duration') #Three
sigma = dic['duration__sum']
avgDur=sigma/popn

A statement like card_list=[Demo.objects.filter(age_group=age)[i].card_no for i in range(popn)] will generate popn seperate queries and database hits. The query in the for-loop will also hit the database popn times. As a general rule, you should try to minimize the amount of queries you use, and you should only select the records you need.
With a few adjustments to your code this can be done in just one query.
There's generally no need to manually specify a primary_key, and in all but some very specific cases it's even better not to define any. Django automatically adds an indexed, auto-incremental primary key field. If you need the card_no field as a unique field, and you need to find rows based on this field, use this:
class Demo(models.Model):
card_no = models.SlugField(max_length=20, unique=True)
...
SlugField automatically adds a database index to the column, essentially making selections by this field as fast as when it is a primary key. This still allows other ways to access the table, e.g. foreign keys (as I'll explain in my next point), to use the (slightly) faster integer field specified by Django, and will ease the use of the model in Django.
If you need to relate an object to an object in another table, use models.ForeignKey. Django gives you a whole set of new functionality that not only makes it easier to use the models, it also makes a lot of queries faster by using JOIN clauses in the SQL query. So for you example:
class Fact_table(models.Model):
card = models.ForeignKey(Demo, related_name='facts')
...
The related_name fields allows you to access all Fact_table objects related to a Demo instance by using instance.facts in Django. (See https://docs.djangoproject.com/en/dev/ref/models/fields/#module-django.db.models.fields.related)
With these two changes, your query (including the loop over the different age_groups) can be changed into a blazing-fast one-hit query giving you the average duration of calls made by each age_group:
age_groups = Demo.objects.values('age_group').annotate(duration_avg=Avg('facts__duration'))
for group in age_groups:
print "Age group: %s - Average duration: %s" % group['age_group'], group['duration_avg']
.values('age_group') selects just the age_group field from the Demo's database table. .annotate(duration_avg=Avg('facts__duration')) takes every unique result from values (thus each unique age_group), and for each unique result will fetch all Fact_table objects related to any Demo object within that age_group, and calculate the average of all the duration fields - all in a single query.

ndb retrieving entity key by ID without parent

I want to get an entity key knowing entity ID and an ancestor.
ID is unique within entity group defined by the ancestor.
It seems to me that it's not possible using ndb interface. As I understand datastore it may be caused by the fact that this operation requires full index scan to perform.
The workaround I used is to create a computed property in the model, which will contain the id part of the key. I'm able now to do an ancestor query and get the key
class SomeModel(ndb.Model):
ID = ndb.ComputedProperty( lambda self: self.key.id() )
#classmethod
def id_to_key(cls, identifier, ancestor):
return cls.query(cls.ID == identifier,
ancestor = ancestor.key ).get( keys_only = True)
It seems to work, but are there any better solutions to this problem?
Update
It seems that for datastore the natural solution is to use full paths instead of identifiers. Initially I thought it'd be too burdensome. After reading dragonx answer I redesigned my application. To my suprise everything looks much simpler now. Additional benefits are that my entities will use less space and I won't need additional indexes.

I ran into this problem too. I think you do have the solution.
The better solution would be to stop using IDs to reference entities, and store either the actual key or a full path.
Internally, I use keys instead of IDs.
On my rest API, I used to do http://url/kind/id (where id looked like "123") to fetch an entity. I modified that to provide the complete ancestor path to the entity: http://url/kind/ancestor-ancestor-id (789-456-123), I'd then parse that string, generate a key, and then get by key.

Since you have full information about your ancestor and you know your id, you could directly create your key and get the entity, as follows:
my_key = ndb.Key(Ancestor, ancestor.key.id(), SomeModel, id)
entity = my_key.get()
This way you avoid making a query that costs more than a get operation both in terms of money and speed.
Hope this helps.

I want to make a little addition to dargonx's answer.
In my application on front-end I use string representation of keys:
str(instance.key())
When I need to make some changes with instence even if it is a descendant I use only string representation of its key. For example I have key_str -- argument from request to delete instance':
instance = Kind.get(key_str)
instance.delete()

My solution is using urlsafe to get item without worry about parent id:
pk = ndb.Key(Product, 1234)
usafe = LocationItem.get_by_id(5678, parent=pk).key.urlsafe()
# now can get by urlsafe
item = ndb.Key(urlsafe=usafe)
print item

What is an appropriate data structure and database schema to store logic rules?

Preface: I don't have experience with rules engines, building rules, modeling rules, implementing data structures for rules, or whatnot. Therefore, I don't know what I'm doing or if what I attempted below is way off base.
I'm trying to figure out how to store and process the following hypothetical scenario. To simplify my problem, say that I have a type of game where a user purchases an object, where there could be 1000's of possible objects, and the objects must be purchased in a specified sequence and only in certain groups. For example, say I'm the user and I want to purchase object F. Before I can purchase object F, I must have previously purchased object A OR (B AND C). I cannot buy F and A at the same time, nor F and B,C. They must be in the sequence the rule specifies. A first, then F later. Or, B,C first, then F later. I'm not concerned right now with the span of time between purchases, or any other characteristics of the user, just that they are the correct sequence for now.
What is the best way to store this information for potentially thousands of objects that allows me to read in the rules for the object being purchased, and then check it against the user's previous purchase history?
I've attempted this, but I'm stuck at trying to implement the groupings such as A OR (B AND C). I would like to store the rules in a database where I have these tables:
Objects
(ID(int),Description(char))
ObjectPurchRules
(ObjectID(int),ReqirementObjectID(int),OperatorRule(char),Sequence(int))
But obviously as you process through the results, without the grouping, you get the wrong answer. I would like to avoid excessive string parsing if possible :). One object could have an unknown number of previous required purchases. SQL or psuedocode snippets for processing the rules would be appreciated. :)

It seems like your problem breaks down to testing whether a particular condition has been satisfied.
You will have compound conditions.
So given a table of items:
ID_Item Description
----------------------
1 A
2 B
3 C
4 F
and given a table of possible actions:
ID_Action VerbID ItemID ConditionID
----------------------------------------
1 BUY 4 1
We construct a table of conditions:
ID_Condition VerbA ObjectA_ID Boolean VerbB ObjectB_ID
---------------------------------------------------------------------
1 OWNS 1 OR MEETS_CONDITION 2
2 OWNS 2 AND OWNS 3
So OWNS means the id is a key to the Items table, and MEETS_CONDITION means that the id is a key to the Conditions table.
This isn't meant to restrict you. You can add other tables with quests or whatever, and add extra verbs to tell you where to look. Or, just put quests into your Items table when you complete them, and then interpret a completed quest as owning a particular badge. Then you can handle both items and quests with the same code.

This is a very complex problem that I'm not qualified to answer, but I've seen lots of references to. The fundamental problem is that for games, quests and items and "stats" for various objects can have non-relational dependencies. This thread may help you a lot.
You might want to pick up a couple books on the topic, and look into using LUA as a rules processor.

Personally I would do this in code, not in SQL. Each item should be its own class implementing an interface (i.e. IItem). IItem would have a method called OkToPurchase that would determine if it is OK to purchase that item. To do that, it would use one or more of a collection of rules (i.e. HasPreviouslyPurchased(x), CurrentlyOwns(x), etc.) that you can build.
The nice thing is that it is easy to extend this approach with new rules without breaking all the existing logic.
Here's some pseudocode:
bool OkToPurchase()
{
if( HasPreviouslyPurchased('x') && !CurrentlyOwns('y') )
return true;
else
return false;
}
bool HasPreviouslyPurchased( item )
{
return purchases.contains( item )
}
bool CurrentlyOwns( item )
{
return user.Items.contains( item )
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight