Data structure to find the best skill-up path in a computer game - data-modeling

I'm a World of Warcraft player and I'm thinking about writing a tool to try and find the cheapest skill-up path in a crafting profession.
In WoW, to level up a crafting profession you need to craft items from that profession, obviously. Those recipes have both a minimum skill requirement and a material list. Recipes stop giving level-ups at an arbitrary level, so I can not craft the same item from level 1 to 100.
So, let's say that I want to level Glabberwoking (not a real profession) from level 10 to level 15. When my character reaches 15 I'll get access to new recipes. At level 10, I have the option to craft from these recipes:
Foo:
- cost: 2
- min level: 5
- max level: 15
Bar
- cost: 1
- min level: 5
- max level: 20
FooBar:
- cost: 1.5
- min level: 15
- max level: 30
Where:
cost is the material cost for crafting a single item,
min level is the level required to learn that recipe,
max level is the level where the player stops getting level-ups from crafting that recipe.
For the example above, the obvious choice is to craft 5x Bar until I reach level 15, when FooBar becomes available.
My first thought was to model this as a multigraph. Each mode would represent a profession level where a new recipe become available. The edges between them would represent the different recipes that can give skill-ups between two profession levels, and the edge weights represent the cost of materials. Dykstra works on multigraphs so I thought I was done.
The problem is that doesn't work. Farther along the profession tree we can get to the following situation:
SuperFoo:
- cost: 2 + 1 x Foo
- min level: 50
- max level: 70
Supercalifragilisticexpialidocious Foobar
- cost: 10
- min level: 50
- max level: 75
So, overall is better to craft 5x Foo from 10 to 15 because between 50 and 70 the cheapest option is to craft SuperFoo. But the edge between 50 to 70 through SuperFoo only exists if I actually crafted Foo earlier. Foo might be available to purchase instead of creating, but that would chance the total cost, which is exactly what I'm trying to minimize.
Any suggestions on which data structures/algorithms I could use?
Thanks a lot!

Related

How do Clash of Clans and other games rank + match-up their players so effectively?

So, a few days ago I worked on a pretty standard ELO system. A player can attack another player's defense, and based on the result, points are redistributed.
It started out working pretty well, but eventually the lower-ranked players could not find match-ups because the way the system finds match-ups is:
Start at the rank #1 player
Go down ranks until it finds the player who requested the match-up
Obviously this resulted in having to request data for 2,000 players if the participant's was near rank 2000.
So, now the low-ranked players are finding it impossible to request.
I noticed in other popular titles like Summoners War and Clash of Clans, your ranking is instantly visible whenever your points are adjusted, and I can't possibly imagine them going down the list of every single player until they've reached #200,000.
I'm unable to use the first strategy that came to mind (the number-guessing game), which would be, if there are 100 players, check #50's points. If your points are lower, check #75's, lower: check #88's, etc. This is because I cannot check #50 without also checking #2, #3, etc due to the nature of the OrderedDataStore
I'm trying to store data in such a way that:
The top 50 players can be displayed
An algorithm can quickly find 4 defenses of players near your rank
You can view your own rank
Any solutions?
You should be able to get the top 50 players like so (assuming your ranking list is simply called "ranks":
ranks.GetSortedAsync(false, 50).GetCurrentPage() -- This will just be an array of the top 50 players.
Finding specific players near your rank should be as easy as the following:
local playerRating -- set this to your player's Elo rating
local searchWidth = 5
ranks.GetSortedAsync(false, 100, playerRating - searchWidth, playerRating + searchWidth)
--[[ the above is a DataStorePages of all players within searchWidth Elo points of your player;
you may want to widen the search if matches are not found, which is why
searchWidth is a variable instead of a magic number ]]
Getting rank is indeed the hardest part; indeed, I think that the solution might actually be along the lines of what you said:
local playerRating -- set to player's Elo rating
local playerRank = 0
local allRanks = ranks.getSortedAsync(false, 100, playerRating)
while not allRanks.IsFinished do
allRanks.AdvanceToNextPageAsync()
playerRank = playerRank + #allRanks.GetCurrentPage()
end
playerRank = playerRank + #allRanks.GetCurrentPage() -- have to add the last page too
Due to throttling, it might be best to cache the top 50 in its own data store whenever the store is updated.
You are correct that they're probably not doing a linear search through all of the ranks, but it appears that that's necessary, here.

How do I find the index of the first occurrence of a score in a sorted Redis set?

I'm writing a search engine using Dewey Decimal call numbers to categorize information. The scheme is as follows:
123.45
2 is a sub-category of 1.
3 is a sub-category of 2.
4 a sub-category of 3. Etc.
It's the same numbering system libraries use to sort their books. 200 for example is religion. 210 is Philosophy & Theory of Religion. 211 is Concepts of God.
The site is one continuous catalog that goes from one subject to the next. Each link is given a score in Redis (the link's Dewey Decimal call no.). The site is setup for 50 links per page. I've got a function which calculates the zrange to pull from the server depending on the page the user's accessing.
Is there a way that I can specify a score and find the index of the first occurrence matching said score -- that way I don't have to iterate the entire database looking for scores when users enter a call no?
You can use ZCOUNT from negative infinity to your specific score to find the index.
For example, in following sorted set:
> ZADD dewey 1 first 3 second 8 third 21 fourth 55 fifth
(integer) 5
To find the index of 21 (the fourth number):
> ZCOUNT dewey -inf (21
3
Just keep in mind that this is 0-based index.

Algorithm for evenly spacing list items (playlist songs) along several categories (id3 tags)

I am having trouble designing an algorithm to assist in the creation of an mp3 playlist, although an algorithm for the more general case of evenly spacing items in a list could be adapted for my use.
The general case is that I would like to reorder items in a list to maximize their diversity along several axes.
My specific use-case is that I want to dump a bunch of songs into a collection, then run my algorithm over the collection to generate an ordered playlist. I would like the order to follow this set of criteria:
maximize the distance between instances of the same artist
maximize the distance between instances of the same genre
maximize the distance between instances of category X
etc for N categories
Obviously we could not guarantee to optimize ALL categories equally, so the first category would be weighted most important, second weighted less, etc. I definitely want to satisfy the first two criteria, but making the algorithm extensible to satisfy N would be fantastic. Maximizing randomness (shuffle) is not a priority. I just want to diversify the listening experience no matter where I come in on the playlist.
This seems close to the problem described and solved here, but I can't wrap my head around how to apply this when all of the items are in the same list with multiple dimensions, rather than separate lists of differing sizes.
This seems like a problem that would have been solved many times by now but I am not able to find any examples of it.
This should be much faster than brute-force:
Order all the songs randomly.
Compute the weights for each song slot (i.e. how close is it to the same artist/genre/etc.). It will be a number from 1-N indicating how may songs away it is from a match. Lower is worse.
Take the song with the lowest weight, and swap that song with a random other song.
Re-compute the weights of the swapped songs. If either got worse, reverse the swap and go back to 3.
For debugging, print the "lowest weight" and overall average weight. (debugging)
Go to 2
You won't find the optimal this way, but it should give mediocre results pretty fast, and eventually improve.
Step 2 can be made fast this way: (pseudo code in Ruby)
# Find the closest match to a song in slot_number
def closest_match(slot_number)
# Note: MAX can be less than N. Maybe nobody cares about songs more than 20 steps away.
(1..MAX).each |step|
return step if matches?(slot_number+step, slot_number) or matches?(slot_number-step, slot_number)
end
return MAX
end
# Given 2 slots, do the songs there match?
# Handles out-of-bounds
def matches?(x,y)
return false if y > N or y < 1
return false if x > N or x < 1
s1 = song_at(x)
s2 = song_at(y)
return true if s1.artist == s2.artist or s1.genre == s2.genere
return false
end
You also don't have to re-compute the whole array: If you cache the weights, you only need to recompute songs that have weight >=X if they are X steps away from a swapped song. Example:
| Song1 | Song2 | Song3 | Song4 | Song5 |
| Weight=3 | Weight=1 | Weight=5 | Weight=3 | Weight=2|
If you are swapping Song 2, you don't have to re-compute song 5: It's 3 steps away from Song 3, but it's weight was 2, so it won't "see" Song 3.
Your problem is probably NP-hard. To get a sense of it, here's a reduction to CLIQUE (an NP-hard problem). That doesn't prove that your problem is NP-hard, but at least gives an idea that there is a connection between the two problems. (To show definitively that your problem is NP-hard, you need a reduction in the other direction: show that CLIQUE can be reduced to your problem. I feel that it is possible, but getting the details right is fussy.)
Suppose you have n=6 songs, A, B, C, D, E, and F. Lay them out in a chart like this:
1 2 3 4 5 6
A A A A A A
B B B B B B
C C C C C C
D D D D D D
E E E E E E
F F F F F F
Connect each item in column 1 with an edge to every other item in every other column, except for items in the same row. So A in column 1 is connected to B, C, D, E, F in column 2, B, C, D, E, F in column 3, and so on. There are n^2 = 36 nodes in the graph and n*(n-1)^2 + n*(n-1)*(n-2) + n*(n-1)*(n-3) + ... = n*(n-1)*n*(n-1)/2 = O(n^4) edges in the graph.
A playlist is a maximum clique in this graph, in other words a selection which is mutually consistent (no song is played twice). So far, not so hard: it's possible to find many maximum cliques very quickly (just permutations of the songs).
Now we add information about the similarity of the songs as edge weights. Two songs that are similar and close get a low edge weight. Two songs that are similar and far apart get a higher edge weight. Now the problem is to find a maximum clique with maximum total edge weight, in other words the NP-hard problem CLIQUE.
There are some algorithms for attacking CLIQUE, but of course they're exponential in time. The best you're going to be able to do in a reasonable amount of time is either to run one of those algorithms and take the best result it can generate in that time, or to randomly generate permutations for a given amount of time and pick the one with the highest score. You might be able to get better results for natural data using something like simulated annealing to solve the optimization problem, but CLIQUE is "hard to approximate" so I have the feeling you won't get much better results that way than by randomly generating proposals and picking the highest scoring.
Here is my idea: you create a graph, where songs are vertices, and paths represent their diversity.
For example we have five songs:
"A", country, authored by John Doe
"B", country, authored by Jane Dean
"C", techno, authored by Stan Chang
"D", techno, authored by John Doe
"E", country, authored by John Doe
We assign weight 2 to artist and 1 to genre, and use multiplicative inverse as path's value. Some of the path will look like this:
A-B: 2*1 + 1*0 = 2 => value of the path is 1/2 = 0.5
A-C: 2*1 + 1*1 = 3 => value of the path is 1/3 = 0.33
A-D: 2*0 + 1*1 = 1 => value of the path is 1/1 = 1
A-E: 2*0 + 1*0 = 0 => value of the path is 1/0 = MAX_DOUBLE
You can have as many categories as you want, weighted as you wish.
Once you have calculated all paths between all songs, all you have to do is use some heuristic algorithm for Travelling Salesman Problem.
EDIT:
I'd like to throw another constraint on the problem: the "maximal distance" should take into account the fact that the playlist may be on repeat. This means that simply putting two songs by the same artist at opposite ends of the playlist will fail since they will be "next to" each other when the list repeats.
Part of Travelling Salesman Problem is that in the end you return to your origin point, so you will have the same song at both ends of your playlist, and both paths (from song and to song) will be calculated with the best efficiency allowed by used heuristics. So all you have to do is remove last entry from your result (because it's the same as the first one) and you can safely repeat without breaking your requirements.
A brute force algorithm for that is easy.
maxDistance = 0
foreach ordering
distance = 0
foreach category
for i=1 to numsongs
for j=i+1 to numsongs
if song i and song j in this ordering have same value for this category
distance = distance + (j-i)*weight_for_this_category
endif
endfor
endfor
endfor
if ( distance > maxDistance )
maxDistance = distance
mark ordering as best one so far
endif
endfor
But that algorithm has worse than exponential complexity with the number of songs so it will take unmanageable ammounts of time pretty fast. The hard part comes in doing it in a reasonable time.
I was thinking about a "spring" approach. If new items are added to the end of the list, they squish the similar items forward.
If I add pink Floyd to the list, then all other Floyd songs get squished to make space.
I would implement the least common dimensions before the most common dimensions, to ensure the more common dimensions are better managed.
For tags in song ordered by count tags in list asc
Evenly space earlier songs with knowledge new song being added
Add song

Wrap gulp-markdown in a div

Hi I'm using: https://www.npmjs.com/package/gulp-markdown
And I want to wrap the following markdown in a div:
<div ng-show="tab.title=='Grades'">
<!-- markdown goes here -->
</div>
Is this possible or is there a better gulp tool to do this?
Markdown:
Grades
------
###A-levels (AAA)
- Applicants must take Chemistry plus one subject from Biology/Human Biology, Maths or Physics
- General studies not accepted
###GCSEs
- Applicants must have minimum of a C in English and Maths. A combination of Grade A and B passes are expected especially in science subjects. Biology is recommended; Physics is recommended (or Dual Award Science).
###Scottish Highers (AAA)
- Applicants must have chemistry to a minimum of a grade B plus two subjects from Biology/Human Biology, Maths & Physics
- You cannot apply at the start of S5 level. All applications from high school must be made in S6.
###International Baccalaureate (36 points)
- 36 points are required overall (excluding TOK & bonus points)
- Applicants must achieve/be predicted 3 subjects at Higher Level at Grade 6 or higher
- Applicants must take 3 other subjects at Standard Level and achieve an average of Grade 6
- Applicants must take Chemistry and offer two of Maths, Biology or Physics over Standard and Higher levels

How to calculate the threshold value for numeric attributes in Quinlan's C4.5 algorithm?

I am trying to find how the C4.5 algorithm determines the threshold value for numeric attributes. I have researched and can not understand, in most places I've found this information:
The training samples are first sorted on the values of the attribute Y being considered. There are only a finite number of these values, so let us denote them in sorted order as {v1,v2, …,vm}.
Any threshold value lying between vi and vi+1 will have the same effect of dividing the cases into those whose value of the attribute Y lies in {v1, v2, …, vi} and those whose value is in {vi+1, vi+2, …, vm}. There are thus only m-1 possible splits on Y, all of which should be examined systematically to obtain an optimal split.
It is usual to choose the midpoint of each interval: (vi +vi+1)/2 as the representative threshold. C4.5 chooses as the threshold a smaller value vi for every interval {vi, vi+1}, rather than the midpoint itself.
I am studying an example of Play/Dont Play (value table) and do not understand how you get the number 75 (tree generated) for the attribute humidity when the state is sunny because the values ​​of humidity to the sunny state are {70,85,90,95}.
Does anyone know?
As your generated tree image implies, you consider attributes in order. Your 75 example belongs to outlook = sunny branch. If you filter your data according to outlook = sunny, you get following table.
outlook temperature humidity windy play
sunny 69 70 FALSE yes
sunny 75 70 TRUE yes
sunny 85 85 FALSE no
sunny 80 90 TRUE no
sunny 72 95 FALSE no
As you can see, threshold for humidity is "< 75" for this condition.
j4.8 is successor to ID3 algorithm. It uses information gain and entropy to decide best split. According to wikipedia
The attribute with the smallest entropy
is used to split the set on this iteration.
The higher the entropy,
the higher the potential to improve the classification here.
I'm not entirely sure about J48, but assuming its based on C4.5 it would compute the gain for all possible splits (i.e., based on the possible values for the feature). For each split, it computes the information gain and chooses the split with the most information gain. In the case of {70,85,90,95} it would compute the information gain for {70|85,90,95} vs {70,85|90,95} vs {70,85,90|95} and choose the best one.
Quinlan's book on C4.5 book is a good starting point (https://goo.gl/J2SsPf). See page 25 in particular.

Resources