Does this bin-packing variant have a name? - bin-packing

I have what sounds like a typical bin-packing problem: x products of differing sizes need to be packed into y containers of differing capacities, minimizing the number of containers used, as well as minimizing the wasted space.
I can simplify the problem in that product sizes and container capacities can be reduced to standard 1-dimensional units. i.e. this product is 1 unit big while that one is 3 units, this box holds 6 units, that one 12. Think of eggs and cartons, or cases of beer.
But there's an additional constraint: each container has a particular attribute (we'll call it colour ), and each product has a set of colours it is compatible with. There is no correlation between colour and product/container sizing; One product may be colour-compatible with the entire palette, Another may only be compatible with the red containers.
Is this problem variant already described in literature? If so, what is its name?

I think there is no special name for this variant. Although the coloring constraint first gives the impression it's graph coloring related, it's not. It's simply a limitation on the values for a variable.
In a typical solver implementation, each product (= item) will have a variable to which container it's assigned. The color constraints just reduces the value range for a specific variable. So instead of specifying that all variables use the same value range, make it variable specific. (For example, in OptaPlanner this is the difference between a value range provided by the solution generally or by the entity specifically.) So the coloring constraint doesn't even need to be a constraint: it can be part of the model in most solvers.
Any solver that can handle bin packing should be able to handle this variant. Your problem is actually a relaxation of the Roadef 2012 Machine Reassignment problem, which is about assigning processes to computers. Simply drop all the constraints, except for 1 resource usage constraints and the constraint which excludes certain processes to certain machines. That use case is implemented in many solvers. (Although, in practice it is probably be easier to start from a basic bin packing example such as Cloud Balancing.)

Most likely 2d bin-packing or classic knapsack problem.

Related

Can a B tree have more solutions?

I have this values
10,15,20,25,30,33,38,40,43,45,50
and then I insert 34
I tried 2 generators
https://s3.amazonaws.com/learneroo/visual-algorithms/BTree.html
http://ysangkok.github.io/js-clrs-btree/btree.html
and they gave me different results
On paper I tried to create the tree inserting those consecutive values 1 by1 and get a totally different result.
If the elements were in random order would the result be the same?
My result is this
The problem is when on the right I have 38|40|45 and I add 50 I have to put 40 a level higher but in the internet generators they also put 33 a level down and I don't see why
Can a B tree have more solutions?
I think you're asking whether there can be more than one one way to store a given set of keys in a b-tree, but you have already answered that yourself. Both of the generated examples you present contain the same keys and are valid 1-3 b-trees. The first is also a valid 1-2 b-tree. With the correction, your attempt is also a valid 1-3 b-tree.
Note well that there are different flavors of b-tree, based on how many keys the internal nodes are permitted to contain, and also that even binary trees, with which you may be more familiar, afford many different structures for the same set of two or more keys.
If the elements were in random order would the result be the same?
Very likely so, yes, but that's not a question of the b-tree form and structure, but rather about the implementation of the software used to construct and maintain it.
You seem confused that
in the internet generators they also put 33 a level down and I don't see why
, but we can only speculate about the implementation of the software supporting those trees. It's unlikely that anyone here can tell you with certainty why they produce the particular b-tree forms they do, but those forms are valid, and so, now, is yours.

How to make the database structure extensible for a turn based game?

I'm working on a fantasy turn base game.
I now have to create the database structure for my spells. The problem is that I don't really have a good idea on how to create it. Maybe the effects of those spells should not be stored in a database?
For instance, effects could be; increase attack, pull an enemy, heal, teleport, hide, put a mine and so on... Effects are pretty different and I would like the database structure to be extensible.
Edit:
It's a turn based game, time is the same as turns and distance represents the squares.
Some examples of what I mean below.
Let's say we have Incinerate:
it can target only 1 enemy (not ally)
it can be casted at a distance of 3 squares
it deals 5 damage per turn
it lasts 3 turns
Now we can take Shock Wave:
it travels in a line for 4 squares
it starts from a square near the caster
it damages the first target it hits (ally or enemy)
it deals 5 damage to the target and knocks it back 1 square
And the last one Rain Call:
it can be casted at any distance
it's a cloud the size of a 5x5 square
it can target both ally and enemies
only fire creatures take damage
while casting the caster is immobilized and it loses 5 mana/turn
As you can see there are a lot of possible columns: the distance it travels, turns, casting distance, type (damage, heal, armor, etc), value (+2), target (enemy, ally, both), size, etc.
I would not use a relational database for storing spells. Relational databases are good in cases when most of the following conditions apply:
you have very large amount of data,
the data can logically be organized as n-ary relations (tables, rows, columns),
you have many users that access to the data concurrently,
you need ACID properties,
et cetera
Databases are like trucks. They are big. They are difficult to use. They are expensive. (in terms of needed expertise, maintenance time, run time efficiency, etc. if not monetarily) They are very good at what they are good at, but not at anything else. Don't use a truck when a bicycle would suffice.
Let's come to your problem. The number of different types of spells is surely bounded and known at compile time, why don't you define an interface ISpell, and let each spell type be a class that implements ISpell? (You can also define an abstract class for common code) Then a SpellFactory may construct and provide access to all the spells when the program starts. Do you really need the spells be accessible from outside independent of your code?
If hard coding a SpellFactory is not flexible enough for your purposes, you can use xml configuration files. <spell type="blind" description="bla bla" picture="file.jpg"> <effects> <effect .. /> .. </effects> <range>5</range> etc. I don't know much about computer games, but this is what they did in sid meier civilization game, for example. Then, instead of hard coding the different spells in the SpellFactory, you can let it read them from the configuration file at the start up.
As far as I can see, using configuration files instead of a database has the following advantages:
It is a fast, easy, lightweight solution,
It is much more flexible than having all the spells having the same set of columns, (most of which will not make sense for a specific spell)
It is much easier to have more than one version of set of spells at the same time, for experiments, variations, etc,
You can let end users access and manipulate xml files for customizing the game without letting them access the database that would also contain sensitive data,
et cetera.
The disadvantages:
More people know about relational databases than xml format, so you might need a couple of hours to learn how to read and manipulate xml "elements".
Your question is pretty large. It depends on a lot of things, are you going to load the spell during runtime? Maybe you will load them at the beginning of the game? What database will you be using?
Amit Bhargava's suggestion is good and has the advantage of being user-understandable. However string are pretty slow, so what you could do is use flags in your spell table. Then, based on the flag you know which type of spell it is.

Naming conventions for non-normalized fields

Is it a common practice to use special naming conventions when you're denormalizing for performance?
For example, let's say you have a customer table with a date_of_birth column. You might then add an age_range column because sometimes it's too expensive to calculate that customer's age range on the fly. However, one could see this getting messy because it's not abundantly clear which values are authoritative and which ones are derived. So maybe you'd want to name that column denormalized_age_range or something.
Is it common to use a special naming convention for these columns? If so, are there established naming conventions for such a thing?
Edit: Here's another, more realistic example of when denormalization would give you a performance gain. This is from a real-life case. Let's say you're writing an app that keeps track of college courses at all the colleges in the US. You need to be able to show, for each degree, how many credits you graduate with if you choose that degree. A degree's credit count is actually ridiculously complicated to calculate and it takes a long time (more than one second per degree). If you have a report comparing 100 different degrees, it wouldn't be practical to calculate the credit count on the fly. What I did when I came across this problem was I added a credit_count column to our degree table and calculated each degree's credit count up front. This solved the performance problem.
I've seen column names use the word "derived" when they represent that kind of value. I haven't seen a generic style guide for other kinds of denormalization.
I should add that in every case I've seen, the derived value is always considered secondary to the data from which it is derived.
In some programming languages, eg Java, variable names with the _ prefix are used for private methods or variables. Private means it should not be modified/invoked by any methods outside the class.
I wonder if this convention can be borrowed in naming derived database columns.
In Postgres, column names can start with _, eg _average_product_price.
It can convey the meaning that you can read this column, but don't write it because it's derived.
I'm in the same situation right now, designing a database schema that can benefit from denormalisation of central values. For example, table partitioning requires the partition key to exist in the table. So even if the data can be retrieved by following some levels of foreign keys, I need the data right there in most tables.
Maybe the suffix "copy" could be used for this. Because after all, the data is just a copy of some other location where the primary data is stored. Since it's a word, it can work with all naming conventions, like .NET PascalCase which can be mapped to SQL snake_case, e. g. CompanyIdCopy and company_id_copy. And it's a short word so you don't have to write too much. And it's not an abbreviation so you don't have to spell it or ever wonder what it means. ;-)
I could also think of the suffix "cache" or "cached" but a cache is usually filled on demand and invalidated some time later, which is usually not the case with denormalised columns. That data should exist at all times and never be outdated or missing.
The word "derived" is just a bit longer than "copy". I know that one special DBMS, an expensive one, has a column name limit of 30 characters, so that could be an issue.
If all of the values required to derive the calculation are in the table already, then it is extremely unlikely that you will gain any meaningful (or even measurable) performance benefit by persisting these calculated values.
I realize this doesn't answer the question directly, but it would seem that the premise is faulty: if such conditions existed for the question to apply, then you don't need to denormalize it to begin with.

Database optimization: What's faster searching by integers OR short strings?

I am wondering about a basic database design / data type question I am having.
I have a porjects table with a field called "experience_required". I know this field will be always populated from one of these options: intern, junior, senior, or director. This list may vary a bit as time evolves but I don't expect dramatic changes to the items on it.
Should I go for integer or string? In the future when I have tons of records like this and need to retrieve them by expeirence_required, will it make a difference to have them in integers?
You may like this field indexed. Once indexed Integer and small Char String don't have much (read negligible) performance difference.
Definitely go for Integer over String.
Performance will be better, and your database will be closer to being normalized.
Ultimately, you should create a new table called ExperienceLevel, with fields Id and Title. The experience_required field in the existing table should be changed to a foreign key on the other table.
This will be a much stronger design, and will be more forgiving in the case that you change the experience levels available, or decide to rename an experience level.
You can read more about Normalization here.
Integers. Strings should IMHO only be used to store textual data (names, addresses, text, etc).
Besides, integers are in this case better for sorting, storage space and maintaining.
In theory integers will take less memory when you index them.
You can also use enums (in mysql) which look like strings but stored as integers.
Doesn't matter. The difference would be negligible. What difference there is would favor the choice of integer, but this is one of the few cases in which I prefer a short text key since it will save a JOIN back to a lookup table in many reporting situations.
To muddy the waters some, I'll suggest a mix. Start with #GregSansom's idea (upvoted), but instead of integers use the CHAR(1) datatype, with values I, J, S, and D. This will give you the same performance as using tinyint, and give the extra advantage of a simple to remember mnemonic when (if) working directly with the data. With a bit of use, it is trivial to remember that "S" means "senior", whereas 3 does not carry any built in meaning--particularly if, as you suggest, extra values are added over time. (Add Probationary as, say, 5, and the "low rank = low value" paradigm is out the window.)
This only works if you have a very short list of items. Get too many or too similar, and it's hard to work up usable codes.
Of course, what if these are sequential values? Sure sounds like it here. In that case, don't make them 1,2,3,4, make them 10, 20, 30, 40, so you can insert new categorizations later on. This would also allow you to easily implement ranges, such as "everyone < 30" (meaning less than "senior").
I guess my main point is: know your data, how it will be used, how it may or will change over time, and plan and code accordingly!

IDs for Information on More Than One DB/Server

I'm working on a project that I want to have be as flexible and scalable as possible from the beginning. A problem I'm concerned about is one best described by Joshua Schacter in Founders at Work, who noted it as one detail he wish he would've planned for ahead of time.
Scaling past one machine, one database, is very challenging, even with replication. The tools that are there are not quite right.
For example, when you add things to a table and it numbers them, that means you can't have a second machine also adding to them because the numbers will collide. So what do you do? You have to come up with some completely different way to do it.
Do you have a central server that hands out number sets, or do you come up with something that's not numbers? Do you use random numbers and hope they never collide? Whatever it is, auto-assigned IDs just don't fly.
Has anyone here faced this problem? What are ways to move beyond auto-incremented IDs, or is there a way to have them scale with multiple servers?
Use GUID/UUID (globally/universally unique identifier). In theory it's guaranteed to be unique across multiple machines.
GUIDs, your chances of collision are astronomically low.
It's also possible to have (what we called) SmartGUIDs (usually called COMB GUIDS - see this analysis, particularly page 7) where you can encode a timestamp within the GUID, so you get record creation date information "for free" - so you can save a timestamp column for record creation datetime - which gets back some of what you lost on moving from 32-bit integer to 128-bit GUID. These can also be guaranteed to be monotonic, unlike regular GUIDs, which can be useful for clustered indexes and for sorting.
You can also use composite keys with some kind of server/db ID with a regular auto-increment identity or auto-number.

Resources