Help with designing a schema for a lyrics database - database

I'd like to work on a project, but it's a little odd. I want to create a site that shows lyrics and their translations, but they are shown simultaneously side-by-side (so this isn't just a normal i18n of the site).
I have normalized the tables like this (formatted to show hierarchy).
artists
artistNames
albums
albumNames
tracks
trackNames
trackLyrics
user
So questions,
First, that'll be a whopping seven joins. I must have written pretty small queries in the past because I've never come across something like this. Is joining so many tables a bad thing? I'm pretty sure I'll be using SQLite for this project, but does anyone think PostgreSQL or MySQL could perform better with a pretty big join like this?
Second, my current self-built framework uses a data mapper to create domain objects. This is the first time I will be working with so many one-to-many relationships, so my mapper really only takes one row as one object. For example,
id name
------ ----------
1 Jackie Chan
2 Stephen Chow
So it's super easy to map objects. But with those one to many relationships...
id language name
------ ---------- -------
1 en Jackie Chan
1 zh 陳港生
2 en Stephen Chow
2 zh 周星馳
...I'm not sure what to do. Is looping through the result set to create a massive array and feeding it to my domain object factory the only option when dealing with a data set like this?
<?php
array(
array(
'id' => 1,
'names' => array(
'en' => 'Jackie Chan'
'zh' => '陳港生'
)
),
array(
'id' => 2,
'names' => array(
'en' => 'Stephan Chow',
'zh' => '周星馳'
)
)
);
?>
I have an itch to just denormalize these tables so I can get my one row per object application working, but I've always read this is not the way to go.
Third, does this schema sound right for the job?

Twelve way joins are not unheard of in serious industrial work. You need sufficient hardware, a strong DBMS, and good database design. Seven way joins should be a breeze for any good environment.
You separate out data, as needed, to avoid difficulties like database update anomalies. These anomalies are what you get when you don't follow the normalization rules. You join data as needed to get the data that you need in a single result.
Sometimes it's better to ignore some of the normalization rules when you build a database. In that case, you need an alternative set of design principles in order to avoid design by trial and error. The amount of joining you are doing has little to do with the disadvantages of looping through results or unfortunate mapping between tuples and objects.
Most of the mappings between tuples (table rows) and objects are done in an incorrect fashion. A tuple is an object, but it isn't an application oriented object. This can cause either performance problems or difficult programmming or both.
As far as you can avoid it, don't loop through results, one row at a time. Deal with results as a set of data. If you can't do that in PHP, then you need to learn how, or get a better programming environment.

Just a note. I'm not really sure that 7 tables is that big a join. I seem to remember that Postgres has a special query optimiser (based on a genetic algorithm, no less) that only kicks in once you join 12 tables or more.

General rule is to make schema as normalized as possible. Then perform stress tests with expected amount of data. If you find performance bottlenecks you should try to optimize in following order:
Profile and optimize queries
add indices to schema
add hints to query optimizer (don't know if SQLite has any, but most of databases do)
If 1. does not gain any performance benefits, consider denormalizing database.
Denormalizing database is usually needed only if you work with "large" amounts of data. I checked several lyrics databases on internet and the largest I found have lyrics for about 400.000 songs. Let's assume you can find 1.000.000 of lyrics performed by 500.000 artists. That is amount of data that all databases can easily handle on average modern computer.

Doing this many joins shouldn't be an issue on any serious DB. I haven't worked with SQLite to know if it's in the "serious" category. The only way to find out would be to create your schema, load up a lot of data and start looking at query plans (visual explains are very useful here). When I am doing these kinds of tests, I usually shoot for 10x the data I expect to have in production. If things work ok with this much data, I know I should be ok with real data.
Also, depending on how you need to retrieve the data, you may want to try subqueries instead of joins:
select a.*, (select r.name from artist r where r.id=a.artist a and r.locale='en') from album where a.id=1;

I've helped a friend optimize a web storefront. In your case, it's a lot the same.
First. What is your priority, webpage speed or update speed?
Normal forms were designed to make data maintenance simple. If Prince changes his name again, voila, just one row is updated. But if you want your web pages to render as fast as possible, then 3rd normal isn't your best plan. Yes, every one is correct that it will do a 7 way join no problem, but that will be dozens of i/o's... index lookup on every table then table access by rowid, then again and again. If you denormalize for webpage loading speed you may do 2 or 3 i/o's. Which will also allow for greater scaling since every page hit will need fewer i/o's to complete, you'll be able to do more simultaneous hits before maxing your i/o.
But there's no reason not to do both. you can keep the base data, the official copy in a normal form, then write a script that can generate a denormal table for web performance. If it's not that big, you can regen the whole thing in a few minute of maintenance downtime. If it is very big, you may need to be smart about the update and only change what needs to be keeping change vectors in an intermediate driving table.
But at the heart of your design I have a question.
Artist names change over time. John Cougar became John Cougar Melonhead (or something) and then later he became John Mellancamp. Do you care which John did a song? will you stamp the entries with from and to valid dates?
It looks like you have a 1-n relationship from artists to albums but that really should many-many.
Sometimes the same album is released more than once, with different included tracks and sometimes with different names for a track. Think international releases. Or bonus tracks. How will you know that's all the same album?
If you don't care about those details then why bother with normalization? If Jon and Vangelis is 1 artist, then there is simply no need to normalize. You're not interested in the answers normalization will provide.

Related

Polyglot persistece with a graph database for relationships is a good ideia?

I would like to know if worth the idea of use graph databases to work specifically with relationships.
I pretend to use relational database for storing entities like "User", "Page", "Comment", "Post" etc.
But in most cases of a typical social graph based workload, I have to get a deep traversals that relational are not good to deal and involves slow joins.
Example: Comment -(made_in)-> Post -(made_in)-> Page etc...
I'm thinking make something like this:
Example:
User id: 1
Query: Get all followers of user_id 1
Query Neo4j for all outcoming edges named "follows" for node user with id 1
With a list of ids query them on the Users table:
SELECT *
FROM users
WHERE user_id IN (ids)
Is this slow?
I have seen this question Is it a good idea to use MySQL and Neo4j together?, but still cannot understand why the correct answer says that that is not a good idea.
Thanks
Using Neo4j is a great choice of technologies for an application like yours, that requires deep traversals. The reason it's a good choice is two-fold: one is that the Cypher language makes such queries very easy. The second is that deep traversals happen very quickly, because of the way the data is structured in the database.
In order to reap both of these benefits, you will want to have both the relationships and the people (as nodes) in the graph. Then you'll be able to do a friend-of-friends query as follows:
START john=node:node_auto_index(name = 'John')
MATCH john-[:friend]->()-[:friend]->fof
RETURN john, fof
and a friend-of-friend-of-friend query as follows:
START john=node:node_auto_index(name = 'John')
MATCH john-[:friend]->()-[:friend]->()->[:friend]->fofof
RETURN john, fofof
...and so on. (Same idea for posts and comments, just replace the name.)
Using Neo4j alongside MySQL is fine, but I wouldn't do it in this particular way, because the code will be much more complex, and you'll lose too much time hopping between Neo4j and MySQL.
Best of luck!
Philip
In general, the more databases/systems/layers you've got, the more complex the overall setup and operating will be.
Think about all those tasks like synchronization, export/import, backup/archive etc. which become quite expensive if your database(s) grow in size.
People use polyglot persistence only if the benefits of having dedicated and specialized databases outweigh the drawbacks of having to cope with multiple data stores. F.e. this can be the case if you have a large number of data items (activity or transaction logs f.e.), each related to a user. It would probably make no sense to store all the information in a graph database if you're only interested in the connections between the data items. So you would be better off storing only the relations in the graph (and the nodes have just a pointer into the other database), and the data per item in a K/V store or the like.
For your example use case, I would go only for one database, namely Neo4j, because it's a graph.
As the other answers indicate, using Neo4j as your single data store is preferable. However, in some cases, there might not be much choice in the matter where you already have another database behind your product. I would just like to add that if this is the case, running neo4j as your secondary database does work (the product I work on operates in this mode). You do have to work extra hard at figuring out what functionality you expect out of neo4j, what kind of data you need for it,how to keep the data in sync and the consequence of suffering from not always real time results. Most of our use cases can work with near real time results so we are fine. Bit it may not be the case for your product. Still, to me , using neo4j in this mode is still preferable than running without it.
We are able to produce a lot of graphy-great stuff as a result of it.

which is better ? Reduncancy with faster access of data , or no redundancy and slower data access

i want to create a database for a forum website...
All the users of the forum website will be stored in a table named USERS with the following fields :
user_name
user_ID
(and additional details)
there will be a single table named FORUMS with the following fields :
forum_ID
forum_creatorID(which is the ID of one of the users)
forum_topic
replies
views
And for each forum(for each row in the FORUMS table) created, there'll be a separate table which is named as "forum_ID"_replies , where the exact forum_ID of that forum will be replaced within the quotes...
thus, each forum will have a separate table where all the replies for that particular forum will be saved...
the fields in the "forum_ID"_replies table are
user_ID
user_name
comment
timestamp(for the comment)
I hope i made my design clear... now, my doubt is
I saved user_name as one of the fields in each "forum_ID"_replies . But, i think the user_name can be referred(or accessed) from USERS table using the user_ID , instead of storing it in each "forum_ID"_replies table. In this way, redundancy is reduced.
But, if user_name is stored in each table, the search for user_name will be reduced , and result can be displayed faster.
Which is more optimal ?
Storing names along with their IDs for faster access, or storing only the IDs to avoid reduncancy ?
"Optimal", "better" etc. are all subjective.
Most database designers would have several problems with your proposal.
Database normalization recommends not duplicating data - for good reason. What happens if your user changes their username? You have to update the user table, but also find all the "forum_id"_replies tables where their username occurs; if you mess that up, all of a sudden, you have a fairly obvious bug - people think they're replying to "bob", but they're actually replying to "jane".
From a performance point of view, unless you have esoteric performance demands (e.g. you're running Facebook), the join to the user table will have no measurable impact - you're joining on a primary key column, and this is what databases are really, really good at.
Finally, creating separate tables for each forum is not really a good idea unless you have huge performance/scalability needs (read: you're Facebook) - the additional complexity in maintaining the database, building queries, connecting your apps to the database etc. is significant; the performance overhead of storing multiple forums in a single table usually is not.
"Better" depends on your criteria. If (as you write in the comments) you are concerned about scalability and supporting huge numbers of posts, my recommendation is to start by building a way of testing and measuring your scalability levels. Once you can test and measure, you can test different solutions, and know whether they have a material impact - very often, this shows counter-intuitive outcomes. Performance optimizations often come at the expense of other criteria - your design, for instance, is more error prone (repeated information means you can get discrepancies) and more expensive to code (writing the logic to join to different tables for each forum). If you can't prove that it has a material benefit in scalability, and that this benefit meets your business requirements, you're probably wasting time & money.
You can use tools like DBMonster to populate your database with test data, and JMeter to run lots of concurrent database queries - use those tools to try both solutions, and see if your solution is, indeed, faster.

Is this a "correct" database design?

I'm working with the new version of a third party application. In this version, the database structure is changed, they say "to improve performance".
The old version of the DB had a general structure like this:
TABLE ENTITY
(
ENTITY_ID,
STANDARD_PROPERTY_1,
STANDARD_PROPERTY_2,
STANDARD_PROPERTY_3,
...
)
TABLE ENTITY_PROPERTIES
(
ENTITY_ID,
PROPERTY_KEY,
PROPERTY_VALUE
)
so we had a main table with fields for the basic properties and a separate table to manage custom properties added by user.
The new version of the DB insted has a structure like this:
TABLE ENTITY
(
ENTITY_ID,
STANDARD_PROPERTY_1,
STANDARD_PROPERTY_2,
STANDARD_PROPERTY_3,
...
)
TABLE ENTITY_PROPERTIES_n
(
ENTITY_ID_n,
CUSTOM_PROPERTY_1,
CUSTOM_PROPERTY_2,
CUSTOM_PROPERTY_3,
...
)
So, now when the user add a custom property, a new column is added to the current ENTITY_PROPERTY table until the max number of columns (managed by application) is reached, then a new table is created.
So, my question is: Is this a correct way to design a DB structure? Is this the only way to "increase performances"? The old structure required many join or sub-select, but this structute don't seems to me very smart (or even correct)...
I have seen this done before on the assumed (often unproven) "expense" of joining - it is basically turning a row-heavy data table into a column-heavy table. They ran into their own limitation, as you imply, by creating new tables when they run out of columns.
I completely disagree with it.
Personally, I would stick with the old structure and re-evaluate the performance issues. That isn't to say the old way is the correct way, it is just marginally better than the "improvement" in my opinion, and removes the need to do large scale re-engineering of database tables and DAL code.
These tables strike me as largely static... caching would be an even better performance improvement without mutilating the database and one I would look at doing first. Do the "expensive" fetch once and stick it in memory somewhere, then forget about your troubles (note, I am making light of the need to manage the Cache, but static data is one of the easiest to manage).
Or, wait for the day you run into the maximum number of tables per database :-)
Others have suggested completely different stores. This is a perfectly viable possibility and if I didn't have an existing database structure I would be considering it too. That said, I see no reason why this structure can't fit into an RDBMS. I have seen it done on almost all large scale apps I have worked on. Interestingly enough, they all went down a similar route and all were mostly "successful" implementations.
No, it's not. It's terrible.
until the max number of column (handled by application) is reached,
then a new table is created.
This sentence says it all. Under no circumstance should an application dynamically create tables. The "old" approach isn't ideal either, but since you have the requirement to let users add custom properties, it has to be like this.
Consider this:
You lose all type-safety as you have to store all values in the column "PROPERTY_VALUE"
Depending on your users, you could have them change the schema beforehand and then let them run some kind of database update batch job, so at least all the properties would be declared in the right datatype. Also, you could lose the entity_id/key thing.
Check out this: http://en.wikipedia.org/wiki/Inner-platform_effect. This certainly reeks of it
Maybe a RDBMS isn't the right thing for your app. Consider using a key/value based store like MongoDB or another NoSQL database. (http://nosql-database.org/)
From what I know of databases (but I'm certainly not the most experienced), it seems quite a bad idea to do that in your database. If you already know how many max custom properties a user might have, I'd say you'd better set the table number of columns to that value.
Then again, I'm not an expert, but making new columns on the fly isn't the kind of operations databases like. It's gonna bring you more trouble than anything.
If I were you, I'd either fix the number of custom properties, or stick with the old system.
I believe creating a new table for each entity to store properties is a bad design as you could end up bulking the database with tables. The only pro to applying the second method would be that you are not traversing through all of the redundant rows that do not apply to the Entity selected. However using indexes on your database on the original ENTITY_PROPERTIES table could help greatly with performance.
I would personally stick with your initial design, apply indexes and let the database engine determine the best methods for selecting the data rather than separating each entity property into a new table.
There is no "correct" way to design a database - I'm not aware of a universally recognized set of standards other than the famous "normal form" theory; many database designs ignore this standard for performance reasons.
There are ways of evaluating database designs though - performance, maintainability, intelligibility, etc. Quite often, you have to trade these against each other; that's what your change seems to be doing - trading maintainability and intelligibility against performance.
So, the best way to find out if that was a good trade off is to see if the performance gains have materialized. The best way to find that out is to create the proposed schema, load it with a representative dataset, and write queries you will need to run in production.
I'm guessing that the new design will not be perceivably faster for queries like "find STANDARD_PROPERTY_1 from entity where STANDARD_PROPERTY_1 = 'banana'.
I'm guessing it will not be perceivably faster when retrieving all properties for a given entity; in fact it might be slightly slower, because instead of a single join to ENTITY_PROPERTIES, the new design requires joins to several tables. You will be returning "sparse" results - presumably, not all entities will have values in the property_n columns in all ENTITY_PROPERTIES_n tables.
Where the new design may be significantly faster is when you need a compound where clause on custom properties. For instance, finding an entity where custom property 1 is true, custom property 2 is banana, and custom property 3 is not in ('kylie', 'pussycat dolls', 'giraffe') is e`(probably) faster when you can specify columns in the ENTITY_PROPERTIES_n tables instead of rows in the ENTITY_PROPERTIES table. Probably.
As for maintainability - yuck. Your database access code now needs to be far smarter, knowing which table holds which property, and how many columns are too many. The likelihood of entertaining bugs is high - there are more moving parts, and I can't think of any obvious unit tests to make sure that the database access logic is working.
Intelligibility is another concern - this solution is not in most developers' toolbox, it's not an industry-standard pattern. The old solution is pretty widely known - commonly referred to as "entity-attribute-value". This becomes a major issue on long-lived projects where you can't guarantee that the original development team will hang around.

Designing tables for storing various requirements and stats for multiplayer game

Original Question:
Hello,
I am creating very simple hobby project - browser based multiplayer game. I am stuck at designing tables for storing information about quest / skill requirements.
For now, I designed my tables in following way:
table user (basic information about users)
table stat (variety of stats)
table user_stats (connecting each user with stats)
Another example:
table monsters (basic information about npc enemies)
table monster_stats (connecting monsters with stats, using the same stat table from above)
Those were the simple cases. I must admit, that I am stuck while designing requirements for different things, e.g quests. Sample quest A might have only minimum character level requirement (and that is easy to implement) - but another one, quest B has multitude of other reqs (finished quests, gained skills, possessing specific items, etc) - what is a good way of designing tables for storing this kind of information?
In a similar manner - what is an efficient way of storing information about skill requirements? (specific character class, min level, etc).
I would be grateful for any help or information about creating database driven games.
Edit:
Thank You for the answers, yet I would like to receive more. As I am having some problems designing an rather complicated database layout for craftable items, I am starting a max bounty for this question.
I would like to receive links to articles / code snippets / anything connected with best practices of designing databases for storing game data (an good example of this kind of information is availibe on buildingbrowsergames.com).
I would be grateful for any help.
I'll edit this to add as many other pertinent issues as I can, although I wish the OP would address my comment above. I speak from several years as a professional online game developer and many more years as a hobbyist online game developer, for what it's worth.
Online games imply some sort of persistence, which means that you have broadly two types of data - one is designed by you, the other is created by the players in the course of play. Most likely you are going to store both in your database. Make sure you have different tables for these and cross-reference them properly via the usual database normalisation rules. (eg. If your player crafts a broadsword, you don't create an entire new row with all the properties of a sword. You create a new row in the player_items table with the per-instance properties, and refer to the broadsword row in the item_types table which holds the per-itemtype properties.) If you find a row of data is holding some things that you designed and some things that the player is changing during play, you need to normalise it out into two tables.
This is really the typical class/instance separation issue, and applies to many things in such games: a goblin instance doesn't need to store all the details of what it means to be a goblin (eg. green skin), only things pertinent to that instance (eg. location, current health). Some times there is a subtlety to the act of construction, in that instance data needs to be created based on class data. (Eg. setting a goblin instance's starting health based upon a goblin type's max health.) My advice is to hard-code these into your code that creates the instances and inserts the row for it. This information only changes rarely since there are few such values in practice. (Initial scores of depletable resources like health, stamina, mana... that's about it.)
Try and find a consistent terminology to separate instance data from type data - this will make life easier later when you're patching a live game and trying not to trash the hard work of your players by editing the wrong tables. This also makes caching a lot easier - you can typically cache your class/type data with impunity because it only ever changes when you, the designer, pushes new data up there. You can run it through memcached, or consider loading it all at start up time if your game has a continuous process (ie. is not PHP/ASP/CGI/etc), etc.
Remember that deleting anything from your design-side data is risky once you go live, since player-generated data may refer back to it. Test everything thoroughly locally before deploying to the live server because once it's up there, it's hard to take it down. Consider ways to be able to mark rows of such data as removed in a safe fashion - maybe a boolean 'live' column which, if set to false, means it just won't show up in the typical query. Think about the impact on players if you disable items they earned (and doubly if these are items they paid for).
The actual crafting side can't really be answered without knowing how you want to design your game. The database design must follow the game design. But I'll run through a trivial idea. Maybe you will want to be able to create a basic object and then augment it with runes or crystals or whatever. For that, you just need a one-to-many relationship between item instance and augmentation instance. (Remember, you might have item type and augmentation type tables too.) Each augmentation can specify a property of an item (eg. durability, max damage done in combat, weight) and a modifier (typically as a multiplier, eg. 1.1 to add a 10% bonus). You can see my explanation for how to implement these modifying effects here and here - the same principles apply for temporary skill and spell effects as apply for permanent item modification.
For character stats in a database driven game, I would generally advise to stick with the naïve approach of one column (integer or float) per statistic. Adding columns later is not a difficult operation and since you're going to be reading these values a lot, you might not want to be performing joins on them all the time. However, if you really do need the flexibility, then your method is fine. This strongly resembles the skill level table I suggest below: lots of game data can be modelled in this way - map a class or instance of one thing to a class or instance of other things, often with some additional data to describe the mapping (in this case, the value of the statistic).
Once you have these basic joins set up - and indeed any other complex queries that result from the separation of class/instance data in a way that may not be convenient for your code - consider creating a view or a stored procedure to perform them behind the scenes so that your application code doesn't have to worry about it any more.
Other good database practices apply, of course - use transactions when you need to ensure multiple actions happen atomically (eg. trading), put indices on the fields you search most often, use VACUUM/OPTIMIZE TABLE/whatever during quiet periods to keep performance up, etc.
(Original answer below this point.)
To be honest I wouldn't store the quest requirement information in the relational database, but in some sort of script. Ultimately your idea of a 'requirement' takes on several varying forms which could draw on different sorts of data (eg. level, class, prior quests completed, item possession) and operators (a level might be a minimum or a maximum, some quests may require an item whereas others may require its absence, etc) not to mention a combination of conjunctions and disjunctions (some quests require all requirements to be met, whereas others may only require 1 of several to be met). This sort of thing is much more easily specified in an imperative language. That's not to say you don't have a quest table in the DB, just that you don't try and encode the sometimes arbitrary requirements into the schema. I'd have a requirement_script_id column to reference an external script. I suppose you could put the actual script into the DB as a text field if it suits, too.
Skill requirements are suited to the DB though, and quite trivial given the typical game system of learning skills as you progress through levels in a certain class:
table skill_levels
{
int skill_id FOREIGN KEY;
int class_id FOREIGN KEY;
int min_level;
}
myPotentialSkillList = SELECT * FROM skill_levels INNER JOIN
skill ON skill_levels.skill_id = skill.id
WHERE class_id = my_skill
ORDER BY skill_levels.min_level ASC;
Need a skill tree? Add a column prerequisite_skill_id. And so on.
Update:
Judging by the comments, it looks like a lot of people have a problem with XML. I know it's cool to bash it now and it does have its problems, but in this case I think it works. One of the other reasons that I chose it is that there are a ton of libraries for parsing it, so that can make life easier.
The other key concept is that the information is really non-relational. So yes, you could store the data in any particular example in a bunch of different tables with lots of joins, but that's a pain. But if I kept giving you a slightly different examples I bet you'd have to modify your design ad infinitum. I don't think adding tables and modifying complicated SQL statements is very much fun. So it's a little frustrating that #scheibk's comment has been voted up.
Original Post:
I think the problem you might have with storing quest information in the database is that it isn't really relational (that is, it doesn't really fit easily into a table). That might be why you're having trouble designing tables for the data.
On the other hand, if you put your quest information directly into code, that means you'll have to edit the code and recompile each time you want to add a quest. Lame.
So if I was you I might consider storing my quest information in an XML file or something similar. I know that's the generic solution for just about anything, but in this case it sounds right to me. XML is really made for storing non-relation and/or hierarchical data, just like the stuff you need to store for your quest.
Summary: You could come up with your own schema, create your XML file, and then load it at run time somehow (or even store the XML in the database).
Example XML:
<quests>
<quest name="Return Ring to Mordor">
<characterReqs>
<level>60</level>
<finishedQuests>
<quest name="Get Double Cheeseburger" />
<quest name="Go to Vegas for the Weekend" />
</finishedQuests>
<skills>
<skill name="nunchuks" />
<skill name="plundering" />
</skills>
<items>
<item name="genie's lamp" />
<item name="noise cancelling headphones for robin williams' voice />
</items>
</characterReqs>
<steps>
<step number="1">Get to Mordor</step>
<step number="2">Throw Ring into Lava</step>
<step number="3">...</step>
<step number="4">Profit</step>
</steps>
</quest>
</quests>
It sounds like you're ready for general object oriented design (OOD) principles. I'm going to purposefully ignore the context (gaming, MMO, etc) because that really doesn't matter to how you do a design process. And me giving you links is less useful than explaining what terms will be most helpful to look up yourself, IMO; I'll put those in bold.
In OOD, the database schema comes directly from your system design, not the other way around. Your design will tell you what your base object classes are and which properties can live in the same table (the ones in 1:1 relationship with the object) versus which to make mapping tables for (anything with 1:n or n:m relationships - for exmaple, one user has multiple stats, so it's 1:n). In fact, if you do the OOD correctly, you will have zero decisions to make regarding the final DB layout.
The "correct" way to do any OO mapping is learned as a multi-step process called "Database Normalization". The basics of which is just as I described: find the "arity" of the object relationships (1:1, 1:n,...) and make mapping tables for the 1:n's and n:m's. For 1:n's you end up with two tables, the "base" table and a "base_subobjects" table (eg. your "users" and "user_stats" is a good example) with the "foreign key" (the Id of the base object) as a column in the subobject mapping table. For n:m's, you end up with three tables: "base", "subobjects", and "base_subobjects_map" where the map has one column for the base Id and one for the subobject Id. This might be necessary in your example for N quests that can each have M requirements (so the requirement conditions can be shared among quests).
That's 85% of what you need to know. The rest is how to handle inheritance, which I advise you to just skip unless you're masochistic. Now just go figure out how you want it to work before you start coding stuff up and the rest is cake.
The thread in #Shea Daniel's answer is on the right track: the specification for a quest is non-relational, and also includes logic as well as data.
Using XML or Lua are examples, but the more general idea is to develop your own Domain-Specific Language to encode quests. Here are a few articles about this concept, related to game design:
The Whimsy Of Domain-Specific Languages
Using a Domain Specific Language for Behaviors
Using Domain-Specific Modeling towards Computer Games Development Industrialization
You can store the block of code for a given quest into a TEXT field in your database, but you won't have much flexibility to use SQL to query specific parts of it. For instance, given the skills a character currently has, which quests are open to him? This won't be easy to query in SQL, if the quest prerequisites are encoded in your DSL in a TEXT field.
You can try to encode individual prerequisites in a relational manner, but it quickly gets out of hand. Relational and object-oriented just don't go well together. You can try to model it this way:
Chars <--- CharAttributes --> AllAttributes <-- QuestPrereqs --> Quests
And then do a LEFT JOIN looking for any quests for which no prereqs are missing in the character's attributes. Here's pseudo-code:
SELECT quest_id
FROM QuestPrereqs
JOIN AllAttributes
LEFT JOIN CharAttributes
GROUP BY quest_id
HAVING COUNT(AllAttributes) = COUNT(CharAttributes);
But the problem with this is that now you have to model every aspect of your character that could be a prerequisite (stats, skills, level, possessions, quests completed) as some kind of abstract "Attribute" that fits into this structure.
This solves this problem of tracking quest prerequisites, but it leaves you with another problem: the character is modeled in a non-relational way, essentially an Entity-Attribute-Value architecture which breaks a bunch of relational rules and makes other types of queries incredibly difficult.
Not directly related to the design of your database, but a similar question was asked a few weeks back about class diagram examples for an RPG
I'm sure you can find something useful in there :)
Regarding your basic structure, you may (depending on the nature of your game) want to consider driving toward convergence of representation between player character and non-player characters, so that code that would naturally operate the same on either doesn't have to worry about the distinction. This would suggest, instead of having user and monster tables, having a character table that represents everything PCs and NPCs have in common, and then a user table for information unique to PCs and/or user accounts. The user table would have a character_id foreign key, and you could tell a player character row by the fact that a user row exists corresponding to it.
For representing quests in a model like yours, the way I would do it would look like:
quest_model
===============
id
name ['Quest for the Holy Grail', 'You Killed My Father', etc.]
etc.
quest_model_req_type
===============
id
name ['Minimum Level', 'Skill', 'Equipment', etc.]
etc.
quest_model_req
===============
id
quest_id
quest_model_req_type_id
value [10 (for Minimum Level), 'Horseback Riding' (for Skill), etc.]
quest
===============
id
quest_model_id
user_id
status
etc.
So a quest_model is the core definition of the quest structure; each quest_model can have 0..n associated quest_model_req rows, which are requirements specific to that quest model. Every quest_model_req is associated with a quest_model_req_type, which defines the general type of requirement: achieving a Minimum Level, having a Skill, possessing a piece of Equipment, and so on. The quest_model_req also has a value, which configures the requirement for this specific quest; for example, a Minimum Level type requirement might have a value of 20, meaning you must be at least level 20.
The quest table, then, is individual instances of quests that players are undertaking or have undertaken. The quest is associated with a quest_model and a user (or perhaps character, if you ever want NPCs to be able to do quests!), and has a status indicating where the progress of the quest stands, and whatever other tracking turns out useful.
This is a bare-bones structure that would, of course, have to be built out to accomodate the needs of particular games, but it should illustrate the direction I'd recommend.
Oh, and since someone else threw around their credentials, mine are that I've been a hobbyist game developer on live, public-facing projects for 16 years now.
I'd be extremely careful of what you actually store in a DB, especially for an MMORPG. Keep in mind, these things are designed to be MASSIVE with thousands of users, and game code has to execute excessively quickly and send a crap-ton of data over the network, not only to the players on their home connections but also between servers on the back-end. You're also going to have to scale out eventually and databases and scaling out are not two things that I feel mix particularly well, particularly when you start sharding into different regions and then adding instance servers to your shards and so on. You end up with a whole lot of servers talking to databases and passing a lot of data, some of which isn't even relevant to the game at all (SQL text going to a SQL server is useless network traffic that you should cut down on).
Here's a suggestion: Limit your SQL database to storing only things that will change as players play the game. Monsters and monster stats will not change. Items and item stats will not change. Quest goals will not change. Don't store these things in a SQL database, instead store them in the code somewhere.
Doing this means that every server that ever lives will always know all of this information without ever having to query a database. Now, you don't store quests at all, you just store accomplishments of the player and the game programatically determines the affects of those quests being completed. You don't waste data transferring information between servers because you're only sending event ID's or something of that nature (you can optimize the data you pass by only using just enough bits to represent all the event ID's and this will cut down on network traffic. May seem insignificant but nothing is insignificant in massive network apps).
Do the same thing for monster stats and item stats. These things don't change during gameplay so there's no need to keep them in a DB at all and therefore this information NEVER needs to travel over the network. The only thing you store is the ID of the items or monster kills or anything like that which is non-deterministic (i.e. it can change during gameplay in a way which you can't predict). You can have dedicated item servers or monster stat servers or something like that and you can add those to your shards if you end up having huge numbers of these things that occupy too much memory, then just pass the data that's necessary for a particular quest or area to the instance server that is handling that thing to cut down further on space, but keep in mind that this will up the amount of data you need to pass down the network to spool up a new instance server so it's a trade-off. As long as you're aware of the consequences of this trade-off, you can use good judgement and decide what you want to do. Another possibility is to limit instance servers to a particular quest/region/event/whatever and only equip it with enough information to the thing it's responsible for, but this is more complex and potentially limits your scaling out since resource allocation will become static instead of dynamic (if you have 50 servers of each quest and suddenly everyone goes on the same quest, you'll have 49 idle servers and one really swamped server). Again, it's a trade-off so be sure you understand it and make good choices for your application.
Once you've identified exactly what information in your game is non-deterministic, then you can design a database around that information. That becomes a bit easier: players have stats, players have items, players have skills, players have accomplishments, etc, all fairly easy to map out. You don't need descriptions for things like skills, accomplishments, items, etc, or even their effects or names or anything since the server can determine all that stuff for you from the ID's of those things at runtime without needing a database query.
Now, a lot of this probably sounds like overkill to you. After all, a good database can do queries very rapidly. However, your bandwidth is extremely precious, even in the data center, so you need to limit your use of it to only what is absolutely necessary to send and only send that data when it's absolutely necessary that it be sent.
Now, for representing quests in code, I would consider the specification pattern (http://en.wikipedia.org/wiki/Specification_pattern). This will allow you to easily build up quest goals in terms of what events are needed to ensure that the specification for completing that quest is met. You can then use LUA (or something) to define your quests as you build the game so that you don't have to make massive code changes and rebuild the whole damn thing to make it so that you have to kill 11 monsters instead of 10 to get the Sword of 1000 truths in a particular quest. How to actually do something like that I think is beyond the scope of this answer and starts to hit the edge of my knowledge of game programming so maybe someone else on here can help you out if you choose to go that route.
Also, I know I used a lot of terms in this answer, please ask if there are any that you are unfamiliar with and I can explain them.
Edit: didn't notice your addition about craftable items. I'm going to assume that these are things that a player can create specifically in the game, like custom items. If a player can continually change these items, then you can just combine the attributes of what they're crafted as at runtime but you'll need to store the ID of each attribute in the DB somewhere. If you make a finite number of things you can add on (like gems in Diablo II) then you can eliminate a join by just adding that number of columns to the table. If there are a finite number of items that can be crafted and a finite number of ways that differnet things can be joined together into new items, then when certain items are combined, you needn't store the combined attributes; it just becomes a new item which has been defined at some point by you already. Then, they just have that item instead of its components. If you clarify the behavior your game is to have I can add additional suggestions if that would be useful.
I would approach this from an Object Oriented point of view, rather than a Data Centric point of view. It looks like you might have quite a lot of (poss complex) objects - I would recommend getting them modeled (with their relationships) first, and relying on an ORM for persistence.
When you have a data-centric problem, the database is your friend. What you have done so far seems to be quite right.
On the other hand, the other problems you mention seem to be behaviour-centric. In this case, an object-oriented analisys and solution will work better.
For example:
Create a quest class with specificQuest child classes. Each child should implement a bool HasRequirements(Player player) method.
Another option is some sort of rules engine (Drools, for example if you are using Java).
If i was designing a database for such a situation, i might do something like this:
Quest
[quest properties like name and description]
reqItemsID
reqSkillsID
reqPlayerTypesID
RequiredItems
ID
item
RequiredSkills
ID
skill
RequiredPlayerTypes
ID
type
In this, the ID's map to the respective tables then you retrieve all entries under that ID to get the list of required items, skills, what have you. If you allow dynamic creation of items then you should have a mapping to another table that contains all possible items.
Another thing to keep in mind is normalization. There's a long article here but i've condensed the first three levels into the following more or less:
first normal form means that there are no database entries where a specific field has more than one item in it
second normal form means that if you have a composite primary key all other fields are fully dependent on the entire key not just parts of it in each table
third normal is where you have no non-key fields that are dependent on other non-key fields in any table
[Disclaimer: i have very little experience with SQL databases, and am new to this field. I just hope i'm of help.]
I've done something sort of similar and my general solution was to use a lot of meta data. I'm using the term loosely to mean that any time I needed new data to make a given decision(allow a quest, allow using an item etc.) I would create a new attribute. This was basically just a table with an arbitrary number of values and descriptions. Then each character would have a list of these types of attributes.
Ex: List of Kills, Level, Regions visited, etc.
The two things this does to your dev process are:
1) Every time there's an event in the game you need to have a big old switch block that checks all these attribute types to see if something needs updating
2) Everytime you need some data, check all your attribute tables BEFORE you add a new one.
I found this to be a good rapid development strategy for a game that grows organically(not completely planned out on paper ahead of time) - but it's one big limitation is that your past/current content(levels/events etc) will not be compatible with future attributes - i.e. that map won't give you a region badge because there were no region badges when you coded it. This of course requires you to update past content when new attributes are added to the system.
just some little points for your consideration :
1) Always Try to make your "get quest" requirements simple.. and "Finish quest" requirements complicated..
Part1 can be done by "trying to make your quests in a Hierarchical order":
example :
QuestA : (Kill Raven the demon) (quest req: Lvl1)
QuestA.1 : Save "unkown" in the forest to obtain some info.. (quest req : QuestA)
QuestA.2 : Craft the sword of Crystal ... etc.. (quest req : QuestA.1 == Done)
QuestA.3 : ... etc.. (quest req : QuestA.2 == Done)
QuestA.4 : ... etc.. (quest req : QuestA.3 == Done)
etc...
QuestB (Find the lost tomb) (quest req : ( QuestA.statues == Done) )
QuestC (Go To the demons Hypermarket) ( Quest req: ( QuestA.statues == Done && player.level== 10)
etc....
Doing this would save you lots of data fields/table joints.
ADDITIONAL THOUGHTS:
if you use the above system, u can add an extra Reward field to ur quest table called "enableQuests" and add the name of the quests that needs to be enabled..
Logically.. you'd have an "enabled" field assigned to each quest..
2) A minor solution for Your crafting problem, create crafting recipes, Items that contains To-be-Crafted-item crafting requirements stored in them..
so when a player tries to craft an item.. he needs to buy a recipe 1st.. then try crafting..
a simple example of such item Desc would be:
ItemName: "Legendary Sword of the dead"
Craftevel req. : 75
Items required:
Item_1 : Blade of the dead
Item_2 : A cursed seal
item_3 : Holy Gemstone of the dead
etc...
and when he presses the "craft" Action, you can parse it and compare against his inventory/craft box...
so Your Crafting DB will have only 1 field (or 2 if u want to add a crafting LvL req. , though it will already be included in the recipe.
ADDITIONAL THOUGHTS:
Such items, can be stored in xml format in the table .. which would make it much easier to parse...
3) A similar XML System can be applied to Your quest system.. to implement quest-ending requirements..

Please help explain if I'm destroying my DB Schema for the sake of performance :(

I've got a database in production for nearly 3 years, on Sql 2008 (was '05, before that). Has been fine, but it isn't very performant. So i'm tweaking the schema and queries to help speed some things up. Also, a score of main tables contain around 1-3 mill rows, per table (to give u a estimate on sizes).
Here's a sample database diagram (Soz, under NDA so i can't display the original) :-
alt text http://img11.imageshack.us/img11/4608/dbschemaexample.png
Things to note (which are directly related to my problem) :-
A vehicle can have 0 (NULL) or 1 Radio. (Left Outer Join)
A vehicle can have 0 (NULL) or 1 Cupholder (Left Outer Join)
A vehicle has 1 Tyre Type (Inner Join).
Firstly, this looks like a normalised database schema. I suck and DB theory, so I'm guessing this is 3NF (at least) ... famous last words :)
Now, this is killing my database performance because these two outer joins and inner join are getting called a lot AND there's also a few more joins in many statements.
To try and fix this, I thought I might try and indexed view. Creating the view is a piece of cake. But indexing it, doesn't work -> can't create indexed views with joins OR self referencing tables (also another prob :( ).
So, i've cried for hours (and /wrists, dyed hair and wrote an emo song about it and put it on myfailspace) and did the following...
Added a new row into each 'optional' outer join tables (in this example, Radios and CupHolders). ID = 0, rest of the data = 'Unknown Blah' or 0's.
Update Parent tables, so that any NULL data's now have a 0.
Update relationship from outer joins to inner joins.
Now, this works. I can even make my indexed view, which is very fast now.
So ... i'm in pain. This just goes against everything I've been taught. I feel dirty. Alone. Infected.
Is this a bad thing to do? Is this a common scenario of denormalizing a database for the sake of performance?
I would love some thoughts on this, please :)
PS. Those images a random google finds -- so not me.
null values generally are not used in indexs. What you've done is to provide a sentinel value so that the column always has a value which allows your indexes to be used more effectively.
You didn't change the structure of your database either, so I wouldn't call this denormalizing. I've done that with date values where you have an "end date" null denoted not ended yet. Instead I made it a known date way in the future which allowed for indexing.
I think this is fine.
Database should always be designed and initially implemented in 3NF. But the world is a place of reality, not ideals, and it's okay to revert to 2NF (or even 1NF) for performance reasons. Don't beat yourself up about it, pragmatism beats dogmatism in the real world all the time.
Your solution, if it improves performance, is a good one. The idea of having an actual radio (for example), manufactured by nobody and having no features, is not a bad one - it's been done a lot before, believe me :-) The only reason you would use that field as NULL was to see which vehicles have no radio and there's little difference between these queries:
select Registration from vehicles where RadioId is null
select Registration from vehicles where RadioId = 0
My first thought was to simply combine the four tables into one and hang the duplicate data issue. Most problems with DBMS' stem from poor performance rather than low storage space.
Maybe keep that as your fallback position if your current de-normalized schema becomes slow as well.
"...So i'm tweaking the schema and queries to help speed some things up..." - I would beg to differ about this. It seems that you're slowing things down. (Just kidding.)
I like the Database Programmer blog. He has two columns for and against normalization that you might find helpful:
http://database-programmer.blogspot.com/2008/10/argument-for-normalization.html
http://database-programmer.blogspot.com/2008/10/argument-for-denormalization.html
I'm not a DBA, but I think the evidence is in front of your eyes: Performance is worse. I don't see what splitting these 1:1 relationships into separate tables is buying you, but I'll be happy to take instruction.
Before I changed anything, I'd ask SQL Server to EXPLAIN PLAN on every query that was slow and use that information to see exactly what should be changed. Don't guess because a normalization guru told you so. Get the data to back up what you're doing. What you're doing sounds like optimizing middle tier code without profiling. Gut feelings aren't very accurate.
im running into the same issue of performance vs academic excellence. we have a large view on a customer database with 300 columns and 91000 records. we use outer joins to create the view and the performance is pretty bad. we have considered changing to inner joins by putting in the dummy records with a value of zero on the columns we join on (instead of null) to enable a unique index on the view.
i have to agree that if performance is important, sometimes strange things have to be done to make it happen. ultimately those who pay our bills don't care if the architecture is perfect.

Resources