Yesod Persistent atomic interaction - database

I was completely missing the point of database opened connection and rollback feature so I was using runDB myAction every time, because I didn't realize what was going on. Today I made some tests to try to understand how it does the rollback, and one of them was this:
getTestR :: Handler Text
getTestR = do
runDB $ insert $ Test 0
runDB $ do
forM_ [1..] $ \n -> do
if n < 10
then do
insert $ Test n
return ()
else undefined
return "completed"
I got an undefined error at runtime, as expected, and only the first runDB action got in the database, the second runDB got rolled back and when I inserted another registry, its id started with 9 positions ahead the last persisted element.
Suppose I have to do 2 gets actions in the database, and I do them in two ways, first I do:
getTestR :: FooId -> BooId-> Handler Text
getTestR fooid booid = do
mfoo <- runDB $ get fooid
mboo <- runDB $ get booid
return "completed"
and then I try:
getTest'R :: FooId -> BooId-> Handler Text
getTest'R fooid booid = do
(mfoo, mboo) <- runDB $ do
mfoo <- get fooid
mboo <- get booid
return (mfoo,mboo)
return "completed"
Which would be the actual overall difference? I think that in this case database consistence is not an issue, but performance may be (or will Haskell laziness make them equal because mfoo and mboo are never used so they are never queried?). Probably these questions look very nonsense, but I would like to be sure I don't have gaps in my understandings.

I think you have answered your own question while discussing two DB actions. 'runDB' has following signature.
runDB :: YesodDB site a -> HandlerT site IO a
YesodDB is a ReaderT transformer monad. runDb lifts DB action to IO action. In the first example, there are two separate IO actions (not DB action). In the second snippet, there is only a single DB action. In the first example, one or both actions may succeed. But in the second one, you will either get result of two gets or an error.
As there are two IO actions wrapping up two runDBs, the DB interaction is not optimized, as each runDB represents a single action. In second however, the two actions will share same connection.
You might want to have a look at YesodPersistentBackend and use getDBRunner for for sharing connection from a pool.

Related

How to limit amount of associations in Elixir Ecto

I have this app where there is a Games table and a Players table, and they share an n:n association.
This association is mapped in Phoenix through a GamesPlayers schema.
What I'm wondering how to do is actually quite simple: I'd like there to be an adjustable limit of how many players are allowed per game.
If you need more details, carry on reading, but if you already know an answer feel free to skip the rest!
What I've Tried
I've taken a look at adding check constraints, but without much success. Here's what the check constraint would have to look something like:
create constraint("games_players", :limit_players, check: "count(players) <= player_limit")
Problem here is, the check syntax is very much invalid and I don't think there actually is a valid way to achieve this using this call.
I've also looked into adding a trigger to the Postgres database directly in order to enforce this (something very similar to what this answer proposes), but I am very wary of directly fiddling with the DB since I should only be using ecto's interface.
Table Schemas
For the purposes of this question, let's assume this is what the tables look like:
Games
Property
Type
id
integer
player_limit
integer
Players
Property
Type
id
integer
GamesPlayers
Property
Type
game_id
references(Games)
player_id
references(Players)
As I mentioned in my comment, I think the cleanest way to enforce this is via business logic inside the code, not via a database constraint. I would approach this using a database transaction, which Ecto supports via Ecto.Repo.transaction/2. This will prevent any race conditions.
In this case I would do something like the following:
begin the transaction
perform a SELECT query counting the number of players in the given game; if the game is already full, abort the transaction, otherwise, continue
perform an INSERT query to add the player to the game
complete the transaction
In code, this would boil down to something like this (untested):
import Ecto.Query
alias MyApp.Repo
alias MyApp.GamesPlayers
#max_allowed_players 10
def add_player_to_game(player_id, game_id, opts \\ []) do
max_allowed_players = Keyword.get(opts, :max_allowed_players, #max_allowed_players)
case is_game_full?(game_id, max_allowed_players) do
false -> %GamesPlayers{
game_id: game_id,
player_id: player_id
}
|> Repo.insert!()
# Raising an error causes the transaction to fail
true -> raise "Game #{inspect(game_id)} full; cannot add player #{inspect(player_id)}"
end
end
defp is_game_full?(game_id, max_allowed_players) do
current_players = from(r in GamesPlayers,
where: r.game_id == game_id,
select: count(r.id)
)
|> Repo.one()
current_players >= max_allowed_players
end

How do you update a single value in xtdb?

Given the following document
{:xt/id 1
:line-item/quantity 23
:line-item/item 20
:line-item/description "Item line description"}
I want to update the quantity to 25
As I can tell so far, I will need to first query the DB, get the full doc, merge in the change, then transact the new doc back.
Is there a way to merge in just a change in quantity without doing the above?
Thank you
You should be able to use transaction functions for this. These will allow you to specify those multiple steps and push them down into the transaction log to ensure that they execute in-sequence (i.e. that you will always retrieve the latest doc to update against at the point in time the transaction function call itself is pushed into the transaction log).
For your specific example I think it would look something like this (untested):
(xt/submit-tx node [[::xt/put
{:xt/id :update-quantity
;; note that the function body is quoted.
;; and function calls are fully qualified
:xt/fn '(fn [ctx eid new-quantity]
(let [db (xtdb.api/db ctx)
entity (xtdb.api/entity db eid)]
[[::xt/put (assoc entity :line-item/quantity new-quantity)]]))}]])
This creates the transaction function itself, then you just need to call it to make the change:
;; `[[::xt/fn <id-of-fn> <id-of-entity> <new-quantity>]]` -- the `ctx` is automatically-injected
(xt/submit-tx node [[::xt/fn :update-quantity 1 25]])

Records not committed in Camel Route

We have an application that uses Apache Camel and Spring-Data-JPA. We have a scenario where items inserted into the database... disappear. The only good news is that we have an integration test that replicates the behavior.
The Camel route is uses direct on it and has the transaction policy of PROPAGATION_REQUIRED. The idea is that we send in an object with a property of status. And when we change the status we are to send the object into a Camel route to record who and when the status was changed. Is this StatusChange object that isn't being saved correctly.
Our test creates the object, saves it (which sends it to the route), changes the status, and saves it again. After those two saves, we should have two StatusChange objects saved but we only have one. But a second is created. All three of these objects (the original and the 2 StatusChange objects) are Spring-Data-JPA objects managed by JpaRepository objects.
We have a log statement in the service that creates and saves the StatusChanges:
log.debug('Saved StatusChange has ID {}', newStatusChange.id)
So after the first one I see:
Saved StatusChange has ID 1
And the on the re-save:
Saved StatusChange has ID 2
Good! we have the second! And then I see we change the original:
changing [StatusChange#ab2e250f { id: 1, ... }] status change to STATUS_CHANGED
But after the test is done, we only have 1 StatusChange object -- the original with ID:1. I know this because I have this in the cleanup step in my test:
sql.eachRow("select * from StatusChange",{ row->
println "ID -> ${row['ID']}, Status -> ${row['STATUS']}";
})
And the result is :
ID -> 1, Status -> PENDING
I would expect this:
ID -> 1, Status -> STATUS_CHANGED
ID -> 2, Status -> PENDING
This happens in the test in 2 steps -- so we are in the same test so no rollbacks should happen between the two. So what could cause it to be persisted the first time and not the second time?
The problem was -- the service that ran after the Camel route was done threw an exception. It was assumed that the transaction was committed, but it was not. So then the transaction was marked as rollback when the exception hit and that is how things disappeared.
The funniest thing -- the exception happened in the service because the transaction hadn't been committed yet. It's a vicious circle.
EDIT: fixed spelling mistake

Power Query M loop table / lookup via a self-join

First of all I'm new to power query, so I'm taking the first steps. But I need to try to deliver sometime at work so I can gain some breathing time to learn.
I have the following table (example):
Orig_Item Alt_Item
5.7 5.10
79.19 79.60
79.60 79.86
10.10
And I need to create a column that will loop the table and display the final Alt_Item. So the result would be the following:
Orig_Item Alt_Item Final_Item
5.7 5.10 5.10
79.19 79.60 79.86
79.60 79.86 79.86
10.10
Many thanks
Actually, this is far too complicated for a first Power Query experience.
If that's what you've got to do, then so be it, but you should be aware that you are starting with a quite difficult task.
Small detail: I would expect the last Final_Item to be 10.10. According to the example, the Final_Item will be null if Alt_Item is null. If that is not correct, well that would be a nice first step for you to adjust the code below accordingly.
You can create a new blank query, copy and paste this code in the Advanced Editor (replacing the default code) and adjust the Source to your table name.
let
Source = Table.Buffer(Table1),
AddedFinal_Item =
Table.AddColumn(
Source,
"Final_Item",
each if [Alt_Item] = null
then null
else List.Last(
List.Generate(
() => [Final_Item = [Alt_Item], Continue = true],
each [Continue],
each [Final_Item =
Table.First(
Table.SelectRows(
Source,
(x) => x[Orig_Item] = [Final_Item]),
[Alt_Item = "not found"]
)[Alt_Item],
Continue = Final_Item <> "not found"],
each [Final_Item])))
in
AddedFinal_Item
This code uses function List.Generate to perform the looping.
For performance reasons, the table should always be buffered in memory (Table.Buffer), before invoking List.Generate.
List.Generate is one of the most complex Power Query functions.
It requires 4 arguments, each of which is a function in itself.
In this case the first argument starts with () and the other 3 with each (it should be clear from the outline above: they are aligned).
Argument 1 defines the initial values: a record with fields Final_Item and Continue.
Argument 2 is the condition to continue: if an item is found.
Argument 3 is the actual transformation in each iteration: the Source table is searched (with Table.SelectRows) for an Orig_Item equal to Alt_Item. This is wrapped in Table.First, which returns the first record (if any found) and accepts a default value if nothing found, in this case a record with field Alt_Item with value "not found", From this result the value of record field [Alt_Item] is returned, which is either the value of the first record, or "not found" from the default value.
If the value is "not found", then Continue becomes false and the iterations will stop.
Argument 4 is the value that will be returned: Final_Item.
List.Generate returns a list of all values from each iteration. Only the last value is required, so List.Generate is wrapped in List.Last.
Final remark: actual looping is rarely required in Power Query and I think it should be avoided as much as possible. In this case, however, it is a feasible solution as you don't know in advance how many Alt_Items will be encountered.
An alternative for List.Generate is using a resursive function.
Also List.Accumulate is close to looping, but that has a fixed number of iterations.
This can be solved simply with a self-join, the open question is how many layers of indirection you'll be expected to support.
Assuming just one level of indirection, no duplicates on Orig_Item, the solution is:
let
Source = #"Input Table",
SelfJoin1 = Table.NestedJoin( Source, {"Alt_Item"}, Source, {"Orig_Item"}, "_tmp_" ),
Expand1 = ExpandTableColumn( SelfJoin1, "_tmp_", {"Alt_Item"}, {"_lkp_"} ),
ChkJoin1 = Table.AddColumn( Expand1, "Final_Item", each (if [_lkp_] = null then [Alt_Item] else [_lkp_]), type number)
in
ChkJoin1
This is doable with the regular UI, using Merge Queries, then Expand Column and adding a custom column.
If yo want to support more than one level of indirection, turn it into a function to be called X times. For data-driven levels of indirection, you wrap the calls in a list.generate that drop the intermediate tables in a structured column, though that's a much more advanced level of PQ.

What Erlang data structure to use for ordered set with the possibility to do lookups?

I am working on a problem where I need to remember the order of events I receive but also I need to lookup the event based on it's id. How can I do this efficiently in Erlang if possible without a third party library? Note that I have many potentially ephemeral actors with each their own events (already considered mnesia but it requires atoms for the tables and the tables would stick around if my actor died).
-record(event, {id, timestamp, type, data}).
Based on the details included in the discussion in comments on Michael's answer, a very simple, workable approach would be to create a tuple in your process state variable that stores the order of events separately from the K-V store of events.
Consider:
%%% Some type definitions so we know exactly what we're dealing with.
-type id() :: term().
-type type() :: atom().
-type data() :: term().
-type ts() :: calendar:datetime().
-type event() :: {id(), ts(), type(), data()}.
-type events() :: dict:dict(id(), {type(), data(), ts()}).
% State record for the process.
% Should include whatever else the process deals with.
-record(s,
{log :: [id()],
events :: event_store()}).
%%% Interface functions we will expose over this module.
-spec lookup(pid(), id()) -> {ok, event()} | error.
lookup(Pid, ID) ->
gen_server:call(Pid, {lookup, ID}).
-spec latest(pid()) -> {ok, event()} | error.
latest(Pid) ->
gen_server:call(Pid, get_latest).
-spec notify(pid(), event()) -> ok.
notify(Pid, Event) ->
gen_server:cast(Pid, {new, Event}).
%%% gen_server handlers
handle_call({lookup, ID}, State#s{events = Events}) ->
Result = find(ID, Events),
{reply, Result, State};
handle_call(get_latest, State#s{log = [Last | _], events = Events}) ->
Result = find(Last, Events),
{reply, Result, State};
% ... and so on...
handle_cast({new, Event}, State) ->
{ok, NewState} = catalog(Event, State),
{noreply, NewState};
% ...
%%% Implementation functions
find(ID, Events) ->
case dict:find(ID, Events) of
{Type, Data, Timestamp} -> {ok, {ID, Timestamp, Type, Data}};
Error -> Error
end.
catalog({ID, Timestamp, Type, Data},
State#s{log = Log, events = Events}) ->
NewEvents = dict:store(ID, {Type, Data, Timestamp}, Events),
NewLog = [ID | Log],
{ok, State#s{log = NewLog, events = NewEvents}}.
This is a completely straightforward implementation and hides the details of the data structure behind the interface of the process. Why did I pick a dict? Just because (its easy). Without knowing your requirements better I really have no reason to pick a dict over a map over a gb_tree, etc. If you have relatively small data (hundreds or thousands of things to store) the performance isn't usually noticeably different among these structures.
The important thing is that you clearly identify what messages this process should respond to and then force yourself to stick to it elsewhere in your project code by creating an interface of exposed functions over this module. Behind that you can swap out the dict for something else. If you really only need the latest event ID and won't ever need to pull the Nth event from the sequence log then you could ditch the log and just keep the last event's ID in the record instead of a list.
So get something very simple like this working first, then determine if it actually suits your need. If it doesn't then tweak it. If this works for now, just run with it -- don't obsess over performance or storage (until you are really forced to).
If you find later on that you have a performance problem switch out the dict and list for something else -- maybe gb_tree or orddict or ETS or whatever. The point is to get something working right now so you have a base from which to evaluate the functionality and run benchmarks if necessary. (The vast majority of the time, though, I find that whatever I start out with as a specced prototype turns out to be very close to whatever the final solution will be.)
Your question makes it clear you want to lookup by ID, but it's not entirely clear if you want to lookup or traverse your data by or based on time, and what operations you might want to perform in that regard; you say "remember the order of events" but storing your records with an index of the ID field will accomplish that.
If you only have to lookup by ID then any of the usual suspects will work as a suitable storage engines, so ets, gb_trees and dict for example would be good. Don't use mnesia unless you need the transactions and safety and all those good features; mnesia is good, but there is a high performance price to be paid for all that stuff, and it's not clear you need it, from your question anyway.
If you do want to lookup or traverse your data by or based on time, then consider an ets table of ordered_set. If that can do what you need then it's probably a good choice. In that case you would employ two tables, one set to provide a hash lookup by ID and another ordered_set to lookup or traverse by timestamp.
If you have two different lookup methods like this there's no getting around the fact you need two indexes. You could store the whole record in both, or, assuming your IDs are unique, you could store the ID as the data in the ordered_set. Which you choose is really a matter of trade off of storage utilisation and read and wrote performance.

Resources