How to define multiple penalties to minimise overall in clingo? - logic-programming

I am trying to use clingo to generate tournament player-room allocations:
player(1..20).
room(1..4).
played(1..20, 0).
rank(1..20, 1).
played(1..20, 1..20, 0).
0 { used_room(R) } 1 :- room(R).
3 { game(P, R) } 4 :- used_room(R), player(P).
:- game(P, R1), game(P, R2), R1 != R2.
penalty(Y) :- Y = #sum {
X: game(P1, R), game(P2, R), played(P1, P2, X);
X: game(P1, R), game(P2, R), rank(P1, R1), rank(P2, R2), abs(R1-R2) = X;
4 - X: played(P, X), not game(P, _)
}.
#minimize { X: penalty(X) }.
The first 5 lines are supposed to be the "input":
The number of players present is variable
So is the number of rooms available
Each player needs to play 4 rounds throughout the night so we record the number of rounds played by each player so far
Each player has a rank (in the league table), which is updated after every round - ideally players in every room should have similar levels (think ELO)
To discourage the algorithm from putting the same players together all the time, we also keep track of the number of rounds any given pair of players spent together in the same room
The idea is to update these inputs after every round (once the points are in) and feed them back into the solver to produce the next round's allocation.
Then, I tried to add some constraints:
There is a certain number of rooms available but they do not all have to be used. Each room can be either used or unused each round
For any room that is used, it has to have either 3 or 4 players assigned to it (due to the mechanics of the game - 4 is always preferred, 3 is for dealing with edge cases)
No player can be assigned to more than one room for any given round
Finally, I tried defining some "penalties" to guide the solver to pick the best allocations:
For every pair of players P1, P2 that were placed in the same room add X to the penalty where X is the number of times they already played together.
For every pair of players P1, P2 that were placed in the same room add the (absolute) difference in their rank to the penalty.
For every player that still has to play in X more rounds but hasn't been selected for this round, add X to the penalty.
What I meant to do was for this penalty to accumulate so that each player who has 4 rounds to go (so every player at the beginning) adds 4 points to the penalty and not just one (which is what happened with this code). In practice, running this gets penalty(4). and no game(player, room). allocations whatsoever.
Also, I'd like to have some constraint so that I cannot end up in a situation where some players still have rounds left to play but there are not enough players left (e.g. if you have 1, 2 or 5 players left who just need to play one round). I am not sure what the right invariant is which could guarantee that this would not happen even several rounds ahead. This is more of an actual logic question than clingo. In practice, you have around 3-4 rooms available and around 20-30 players - importantly, there is never a guarantee that # players is a factor of 4.
Something else that's missing from my current "implementation" is a constraint such that for a specific subset of players (let's call them "experts"), at least one of them has to stay out of the current round (and lead it). And in general for each room used, at least one player has to stay out (including the one expert). This should be a hard constraint.
Finally, we'd like to maximise utilisation for the rooms i.e. maximise the number of players per round and minimise the number of rounds overall. This should be a weak constraint (just like the constraints to do with ranks and games played so far together).
Many thanks in advance for any help or advice! Unfortunately, the documentation does not give so many sophisticated examples so I couldn't figure out what the right syntax for my use cases is.

Writing everything at start and trying to debug afterwards is difficult in answer set programming. In your case it may be better to first define your search space and one by one write constraints to remove unwanted asnwers.
To update inputs after every round you will have to work with "online ASP". You may want to consider looking https://potassco.org/clingo/ as it contains valuable learning material which could help with your learning.
Encoding below may be a good starting point for you
%%% constants %%%
#const numberOfRounds = 4.
#const numberOfPlayers = 2.
#const numberOfRooms = 4.
%%% constants %%%
%%% define players and their initial ranks %%%
player(1..numberOfPlayers,1).
%%% define players and their initial ranks %%%
%%% define rooms %%%
room(1..numberOfRooms).
%%% define rooms %%%
%%% define rounds %%%
round(1..numberOfRounds).
%%% define rounds %%%
%%% define search space (all possible values) %%%
search(P,R,S) :- player(P,_), room(R), round(S).
%%% define search space (all possible values) %%%
%%% define played %%%
{played(P,R,S)} :- search(P,R,S).
%%% define played %%%
%%% remove answers that does not satisfy the condition "Each player needs to play 4 rounds" %%%
:- player(P,_), X = #count{S : played(P,_,S)}, X != numberOfRounds.
%%% remove answers that does not satisfy the condition "Each player needs to play 4 rounds" %%%
%%% show output %%%
#show.
#show played/3.
%%% show output %%%

Based on NTP's advice, I tried rewriting again and now pretty much all constraints are present and seem to work except for the ranking based penalty which I still have to add.
%%% constants %%%
#const nRounds = 3.
#const nPlayers = 4.
#const nRooms = 3.
#const nDecks = 4.
player(1..nPlayers).
room(1..nRooms).
deck(1..nDecks).
writer(1,1;2,2;3,3;4,4).
{ played(P, R, D) } :- player(P), room(R), deck(D).
% A player can only play a deck in a single room.
:- played(P, R1, D), played(P, R2, D), R1 != R2.
% A player must play nRounds decks overall.
:- player(P), X = #count { R, D: played(P, R, D) }, X != nRounds.
% Any deck in any room must be played by 3-4 players.
legal_player_count(3;4).
:- room(R), deck(D),
X = #count { P: played(P, R, D) },
X > 0,
not legal_player_count(X).
% Writers cannot play their own decks.
:- writer(P, D), played(P, _, D).
% At least one non-playing player per room.
:- deck(D),
Playing = #count { P, R: played(P, R, D) },
Rooms = #count { R: played(_, R, D) },
nPlayers - Playing < Rooms.
% Input points(P, R, D, X) to report points.
% winner(P, R, D) :- points(P, R, D, X), X = #max { Y : points(_, R, D, Y) }.
% Total number of decks played throughout the night (for minimisation?)
decks(X) :- X = #count { D: played(_, _, D) }.
% Total number of games played together by the same players (for minimisation)
% The total sum of this predicate is invariant
% Minimisation should took place by a superlinear value (e.g. square)
common_games(P1, P2, X) :- player(P1), player(P2), P1 != P2,
X = #count { R, D: played(P1, R, D), played(P2, R, D) }, X > 0.
% For example:
% common_game_penalty(X) :- X = #sum { Y*Y, P1, P2 : common_games(P1, P2, Y) }.
% Another rank-based penalty needs to be added once the rank mechanics are there
% Then the 2 types of penalties need to be combined and / or passed to the optimiser
#show decks/1.
#show played/3.
#show common_games/3.

Related

Reducing the search space efficiently in clingo

I am struggling to scale a constraint problem (it breaks down for large values and / or if I try to optimise instead of just looking for any solution). I've taken some steps to break the search space down based on advice from some previous questions but it's still stalling. Are there any more techniques that can help me optimise this computation?
%%% constants %%%
#const nRounds = 4.
#const nPlayers = 20.
#const nRooms = 4.
#const nDecks = 7.
player(1..nPlayers).
room(1..nRooms).
deck(1..nDecks).
writer(1,1;2,2;3,3;4,4).
% For reference - that's what I started with:
%nRounds { seat(Player, 1..nRooms, 1..nDecks) } nRounds :- player(Player).
% Now instead I'm using a few building blocks
% Each player shall only play nRounds decks
nRounds { what(Player, 1..nDecks) } nRounds :- player(Player).
% Each player shall only play in up to nRounds rooms.
1 { where(Player, 1..nRooms) } nRounds :- player(Player).
% For each deck, 3 or 4 players can play in each room.
3 { who(1..nPlayers, Room, Deck) } 4 :- room(Room), deck(Deck).
% Putting it all together, hopefully, this leads to fewer combinations than the original monolithic choice rule.
{ seat(Player, Room, Deck) } :- what(Player, Deck), where(Player, Room), who(Player, Room, Deck).
% A player can only play a deck in a single room.
:- seat(Player, Room1, Deck), seat(Player, Room2, Deck), Room1 != Room2.
% A player must play nRounds decks overall.
:- player(Player), #count { Room, Deck: seat(Player, Room, Deck) } != nRounds.
% Any deck in any room must be played by 3-4 players.
legal_player_count(3..4).
:- room(Room), deck(Deck),
Players = #count { Player: seat(Player, Room, Deck) },
Players > 0,
not legal_player_count(Players).
% Writers cannot play their own decks.
:- writer(Player, Deck), seat(Player, _, Deck).
% At least one non-playing player per room.
:- deck(Deck),
Playing = #count { Player, Room: seat(Player, Room, Deck) },
Rooms = #count { Room: seat(_, Room, Deck) },
nPlayers - Playing < Rooms.
%:- room(R1), deck(D), room(R2), X = #sum { P: seat(P, R1, D) }, Y = #sum { P: seat(P, R2, D) }, R1 > R2, X > Y.
#minimize { D: decks(D) }.
#show decks/1.
#show seat/3.
% #show common_games/3.
When, or if, this becomes manageable I am hoping to add more optimisation objectives to choose the best configurations along the lines of:
% Input points(P, R, D, X) to report points.
% winner(P, R, D) :- points(P, R, D, X), X = #max { Y : points(_, R, D, Y) }.
% Compute each player's rank based on each round:
% rank(P, D, R) :- points(P, Room, D, X), winner(Winner, Room, D), D_ = D - 1,
% rank(P, D_, R_),
% R = some_combination_of(X, P=Winner, R_).
% latest_rank(P, R) :- D = #max { DD: rank(P, DD, _) }, rank(P, D, R).
% Total number of decks played throughout the night (for minimisation?)
decks(Decks) :- Decks = #count { Deck: seat(_, _, Deck) }.
% Total number of games played together by the same players (for minimisation)
% The total sum of this predicate is invariant
% Minimisation should took place by a superlinear value (e.g. square)
common_games(Player1, Player2, Games) :-
player(Player1), player(Player2), Player1 != Player2,
Games = #count { Room, Deck:
seat(Player1, Room, Deck),
seat(Player2, Room, Deck)
}, Games > 0.
% For example:
% common_game_penalty(X) :- X = #sum { Y*Y, P1, P2 : common_games(P1, P2, Y) }.
% Another rank-based penalty needs to be added once the rank mechanics are there
% Then the 2 types of penalties need to be combined and / or passed to the optimiser
Update - Problem description
P players gather for a quiz night. D decks and R rooms are
available to play.
Each room can only ever host either 3 or 4 players (due to the rules of the game, not space).
Each Deck is played at most once and is played in multiple rooms simultaneously - so in a sense Deck is kind of synonymous to "Round".
Each player can only play the same Deck at most once.
Each player only gets to play N times during the night (N is pretty much fixed and it's 4).
So if 9 decks are played during the night (i.e. if there are lots of players
present), each will play 4 out of these 9.
Therefore, it is not necessary for each player to play in each "deck/round". In fact, for each deck there is a writer and it is usually one of the players
present.
Naturally, the writer cannot play their own deck so they have to stay out for that round. Additionally, for each deck/round,
somebody must read the questions in each room so if 16 players are
present and there are 4 rooms, it is impossible for all 16 players to
play. It is possible to have 4 rooms with 3 players each (and the
remaining 4 players read out the questions) or to have 3 rooms with 4
players each (with 3 players reading out the questions and 1
spectating).
Hopefully, this clears up the confusion, if not I can try to give more elaborate examples but basically, say if you have 4 rooms and 30 players:
You pick 16 who'll play and 4 more who'll read out the questions
Then you have 16 people who played their 1/4 deck/rounds and 14 who are still at 0/4
So then you can either let the other 14 people play (4,4,3,3 players per room) or continue maximising the room utility so that after the second round everyone played at least once and 2/30 players have already played 2/4 games.
So then you continue picking some number of people until everyone has played exactly 4 decks/rounds.
P.S. You have 2 notions of round - one at a personal level where everyone has 4 to play and the other at the league level where there is some number of decks >4 and each deck is considered "a round" in the eyes of everyone present. From what I understood this was the most confusing bit about the setup that I didn't clarify well at the beginning.
I have rewritten the encoding with your new specification without too much optimizations to get the problem straight.
Remarks:
I assume that one that "reads the questions" is the writer ?
I assured that there is 1 writer per room available but I didn't name it.
#const nPlayers = 20.
#const nRooms = 4.
#const nDecks = 6.
player(1..nPlayers).
room(1..nRooms).
deck(1..nDecks).
% player P plays in room R in round D
{plays(P,R,D)} :- deck(D), room(R), player(P).
% a player may only play in a single room each round
:- player(P), deck(D), 1 < #sum {1,R : plays(P,R,D)}.
% not more than 4 players per room
:- deck(D), room(R), 4 < #sum {1,P : plays(P,R,D)}.
% not less than 3 players per room
:- deck(D), room(R), 3 > #sum {1,P : plays(P,R,D)}.
plays(P,D) :- plays(P,R,D).
% at least one writer per room (we need at least one player not playing for each room, we do not care who does it)
:- deck(D), nRooms > #sum {1,P : not plays(P,D), player(P)}.
% each player only plays 4 times during the night
:- player(P), not 4 = #sum {1,D : plays(P,D)}.
#show plays/3.
%%% shortcut if too many decks are used, each player can only play 4 times but at least 3 players have to play in a room (currently there is no conecpt of an empty room)
:- 3*nRooms*nDecks > nPlayers*4.
Note that I added the last constraint, as your initial configuration was not solveable (each player has to play exactly 4 rounds and we have twenty players, this is 80 individual games. Given that at least 3 players have to be in a room, and we have 4 rooms and 7 decks this is 3 * 4 * 7 = 84, we would need to play at least 84 individual games). You could probably also compute the number of decks I think.

AI : evaluate mass of a spaceship via prod (exert force lightly) and sense change in its velocity

Problem
I have to code AI to find mass of a spaceship in a game.
My AI can exert a little force c to the spaceship, to measure the mass via change of velocity.
However, my AI can access only current position of spaceship ,x, in every time-step.
Mass is not constant, but it is safe to assume that it will not change too fast.
For simplicity :-
Let the space be 1D, and has no gravity.
Timestep is always 1 second.
Forces
There are many forces that exert on the spaceship currently, e.g. gravity, an automatic propulsion system controlled by an unknown AI, collision impulse, etc.
The summation of these forces is b, which depends on t (time).
Acceleration a for a certain timestep is calculated by a game-play formula which is out of my control:-
a = (b+c)/m ................. (1)
The velocity v is updated as:-
v = vOld + a ................. (2)
The position x is updated as:-
x = xOld + v ................. (3)
The order of execution (1)-(3) is also unknown, i.e. AI should not rely on such order.
My poor solution
I will exert c0=0.001 for a few second and compare result against when I exert c1=-0.001.
I would assume that b and m are constant for the time period.
I calculate acceleration via :-
t 0 1 2 3 (Exert force `c0` at `t1`, `c1` at `t2`)
x 0 1 2 3 (The number are points in timeline that I sampling x.)
v 0 1 2 (v0=x1-x0, v1=x2-x1, ... )
a 0 1 (a0=v1-v0, ... )
Now I know acceleration of 2 points of timeline, and I can cache c because I am the one who exert it.
With a = (b+c)/m, with unknown b and m and known a0,a1,c0 and c1:-
a0 = (b+c0)/m
a1 = (b+c1)/m
I can solve them to find b and m.
However, my assumption is wrong at the beginning.
b and m are actually not constants.
This problem might be viewed in a more casual way :-
Many persons are trying to lift a heavy rock.
I am one of them.
How can I measure the mass of the rock (with feeling from my hand) without interrupt them too much?

Interest Rate in Value Iteration Algorithm

In the chapter about Value Iteration algorithm to calculate optimal policy for MDPs, there is an algorithm:
function Value-Iteration(mdp,ε) returns a utility function
inputs: mdp, an MDP with states S, actions A(s), transition model P(s'|s,a),
rewards R(s), discount γ
ε, the maximum error allowed in the utility of any state
local variables: U, U', vectors of utilities for states in S, initially zero
δ, the maximum change in the utility of any state in an iteration
repeat
U ← U'; δ ← 0
for each state s in S do
U'[s] ← R(s) + γ max(a in A(s)) ∑ over s' (P(s'|s,a) U[s'])
if |U'[s] - U[s]| > δ then δ ← |U'[s] - U[s]|
until δ < ε(1-γ)/γ
return U
(I apologize for the formatting, but I need 10 rep to post picture and $latex formatting$ doesn't seem to work here.)
and also a chapter earlier there was a statement:
A discount factor of γ is equivalent to an interest rate of (1/γ) − 1.
Could anyone explain to me what does the interest rate (1/γ)-1 mean? How did they get it? Why is it used in the termination condition in the algorithm above?
The reward at t-1 is considered discounted by a factor gamma (y). That is to say, old = y x new. So new = (1/y) * old, and new - old = ((1/y) - 1) * old. That is your interest rate.
I am not so sure why it is used in the termination condition. The value of epsilon is arbitrary, anyway.
In fact, I believe this termination criterion is very bad. It does not work when y = 1. When y = 0, then the iteration should stop immediately, since it is enough to estimate perfect values. When y = 1, many iterations are necessary.

smallest subset of array whose sum is equal to key. Condition : Values can be used any number of times

I was asked this question in interview.
Given a list of 'N' coins, their values being in an array A[], return the minimum number of coins required to sum to 'S' (you can use as many coins you want). If it's not possible to sum to 'S', return -1
Note here i can use same coins multiple times.
Example:
Input #00:
Coin denominations: { 1,3,5 }
Required sum (S): 11
Output #00:
3
Explanation:
The minimum number of coins requires is: 3 - 5 + 5 + 1 = 11;
Is there any better way we can think except Sorting the array and start it by both ends?
This is the change-making problem.
A simple greedy approach, which you seem to be thinking of, won't always produce an optimal result. If you elaborate a bit on what exactly you mean by starting from both ends, I might be able to come up with a counter-example.
It has a dynamic programming approach, taken from here:
Let C[m] be the minimum number of coins of denominations d1,d2,...,dk needed to make change for m amount. In the optimal solution to making change for m amount, there must exist some first coin di, where di < m. Furthermore, the remaining coins in the solution must themselves be the optimal solution to making change for m - di.
Thus, if di is the first coin in the optimal solution to making change for m amount, then C[m] = 1 + C[m - di], i.e. one di coin plus C[m - di] coins to optimally make change for m - di amount. We don't know which coin di is the first coin; however, we may check all n such possibilities (subject to the constraint that di < m), and the value of the optimal solution must correspond to the minimum value of 1 + C[m - di], by definition.
Furthermore, when making change for 0, the value of the optimal solution is clearly 0 coins. We thus have the following recurrence.
C[p] = 0 if p = 0
min(i: di < p) {1 + C[p - di]} if p > 0
Pathfinding algorithms (Dijkstra, A*, meeting on the middle, etc.) could be suitable for this on graph like this:
0
1/|\5
/ |3\
/ | \
1 3 5
1/|\51/| ...
/ |3\/ |3
/ | /\ |
2 4 6
....
Other way is recursive bisection. Say, if we cannot get the sum S with one coin, we start to try to get amounts (S/2, S/2)...(S-1,1) recursively until we find suitable coin or reach S=1.

Information Gain and Entropy

I recently read this question regarding information gain and entropy. I think I have a semi-decent grasp on the main idea, but I'm curious as what to do with situations such as follows:
If we have a bag of 7 coins, 1 of which is heavier than the others, and 1 of which is lighter than the others, and we know the heavier coin + the lighter coin is the same as 2 normal coins, what is the information gain associated with picking two random coins and weighing them against each other?
Our goal here is to identify the two odd coins. I've been thinking this problem over for a while, and can't frame it correctly in a decision tree, or any other way for that matter. Any help?
EDIT: I understand the formula for entropy and the formula for information gain. What I don't understand is how to frame this problem in a decision tree format.
EDIT 2: Here is where I'm at so far:
Assuming we pick two coins and they both end up weighing the same, we can assume our new chances of picking H+L come out to 1/5 * 1/4 = 1/20 , easy enough.
Assuming we pick two coins and the left side is heavier. There are three different cases where this can occur:
HM: Which gives us 1/2 chance of picking H and a 1/4 chance of picking L: 1/8
HL: 1/2 chance of picking high, 1/1 chance of picking low: 1/1
ML: 1/2 chance of picking low, 1/4 chance of picking high: 1/8
However, the odds of us picking HM are 1/7 * 5/6 which is 5/42
The odds of us picking HL are 1/7 * 1/6 which is 1/42
And the odds of us picking ML are 1/7 * 5/6 which is 5/42
If we weight the overall probabilities with these odds, we are given:
(1/8) * (5/42) + (1/1) * (1/42) + (1/8) * (5/42) = 3/56.
The same holds true for option B.
option A = 3/56
option B = 3/56
option C = 1/20
However, option C should be weighted heavier because there is a 5/7 * 4/6 chance to pick two mediums. So I'm assuming from here I weight THOSE odds.
I am pretty sure I've messed up somewhere along the way, but I think I'm on the right path!
EDIT 3: More stuff.
Assuming the scale is unbalanced, the odds are (10/11) that only one of the coins is the H or L coin, and (1/11) that both coins are H/L
Therefore we can conclude:
(10 / 11) * (1/2 * 1/5) and
(1 / 11) * (1/2)
EDIT 4: Going to go ahead and say that it is a total 4/42 increase.
You can construct a decision tree from information-gain considerations, but that's not the question you posted, which is only the compute the information gain (presumably the expected information gain;-) from one "information extraction move" -- picking two random coins and weighing them against each other. To construct the decision tree, you need to know what moves are affordable from the initial state (presumably the general rule is: you can pick two sets of N coins, N < 4, and weigh them against each other -- and that's the only kind of move, parametric over N), the expected information gain from each, and that gives you the first leg of the decision tree (the move with highest expected information gain); then you do the same process for each of the possible results of that move, and so on down.
So do you need help to compute that expected information gain for each of the three allowable values of N, only for N==1, or can you try doing it yourself? If the third possibility obtains, then that would maximize the amount of learning you get from the exercise -- which after all IS the key purpose of homework. So why don't you try, edit your answer to show you how you proceeded and what you got, and we'll be happy to confirm you got it right, or try and help correct any misunderstanding your procedure might reveal!
Edit: trying to give some hints rather than serving the OP the ready-cooked solution on a platter;-). Call the coins H (for heavy), L (for light), and M (for medium -- five of those). When you pick 2 coins at random you can get (out of 7 * 6 == 42 possibilities including order) HL, LH (one each), HM, MH, LM, ML (5 each), MM (5 * 4 == 20 cases) -- 2 plus 20 plus 20 is 42, check. In the weighting you get 3 possible results, call them A (left heavier), B (right heavier), C (equal weight). HL, HM, and ML, 11 cases, will be A; LH, MH, and LM, 11 cases, will be B; MM, 20 cases, will be C. So A and B aren't really distinguishable (which one is left, which one is right, is basically arbitrary!), so we have 22 cases where the weight will be different, 20 where they will be equal -- it's a good sign that the cases giving each results are in pretty close numbers!
So now consider how many (equiprobable) possibilities existed a priori, how many a posteriori, for each of the experiment's results. You're tasked to pick the H and L choice. If you did it at random before the experiment, what would be you chances? 1 in 7 for the random pick of the H; given that succeeds 1 in 6 for the pick of the L -- overall 1 in 42.
After the experiment, how are you doing? If C, you can rule out those two coins and you're left with a mystery H, a mystery L, and three Ms -- so if you picked at random you'd have 1 in 5 to pick H, if successful 1 in 4 to pick L, overall 1 in 20 -- your success chances have slightly more than doubled. It's trickier to see "what next" for the A (and equivalently B) cases because they're several, as listed above (and, less obviously, not equiprobable...), but obviously you won't pick the known-lighter coin for H (and viceversa) and if you pick one of the 5 unweighed coins for H (or L) only one of the weighed coins is a candidate for the other role (L or H respectively). Ignoring for simplicity the "non equiprobable" issue (which is really kind of tricky) can you compute what your chances of guessing (with a random pick not inconsistent with the experiment's result) would be...?

Resources