Best practices for consistent and comprehensive address storage in a database [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Are there any best practices (or even standards) to store addresses in a consistent and comprehensive way in a database ?
To be more specific, I believe at this stage that there are two cases for address storage :
you just need to associate an address to a person, a building or any item (the most common case). Then a flat table with text columns (address1, address2, zip, city) is probably enough. This is not the case I'm interested in.
you want to run statistics on your addresses : how many items in a specific street, or city or... Then you want to avoid misspellings of any sorts, and ensure consistency. My question is about best practices in this specific case : what are the best ways to model a consistent address database ?
A country specific design/solution would be an excellent start.
ANSWER : There does not seem to exist a perfect answer to this question yet, but :
xAL, as suggested by Hank, is the closest thing to a global standard that popped up. It seems to be quite an overkill though, and I am not sure many people would want to implement it in their database...
To start one's own design (for a specific country), Dave's link to the Universal Postal Union (UPU) site is a very good starting point.
As for France, there is a norm (non official, but de facto standard) for addresses, which bears the lovely name of AFNOR XP Z10-011 (french only), and has to be paid for. The UPU description for France is based on this norm.
I happened to find the equivalent norm for Sweden : SS 613401.
At European level, some effort has been made, resulting in the norm EN 14142-1. It is obtainable via CEN national members.

I've been thinking about this myself as well. Here are my loose thoughts so far, and I'm wondering what other people think.
xAL (and its sister that includes personal names, XNAL) is used by both Google and Yahoo's geocoding services, giving it some weight. But since the same address can be described in xAL in many different ways--some more specific than others--then I don't see how xAL itself is an acceptable format for data storage. Some of its field names could be used, however, but in reality the only basic format that can be used among the 16 countries that my company ships to is the following:
enum address-fields
{
name,
company-name,
street-lines[], // up to 4 free-type street lines
county/sublocality,
city/town/district,
state/province/region/territory,
postal-code,
country
}
That's easy enough to map into a single database table, just allowing for NULLs on most of the columns. And it seems that this is how Amazon and a lot of organizations actually store address data. So the question that remains is how should I model this in an object model that is easily used by programmers and by any GUI code. Do we have a base Address type with subclasses for each type of address, such as AmericanAddress, CanadianAddress, GermanAddress, and so forth? Each of these address types would know how to format themselves and optionally would know a little bit about the validation of the fields.
They could also return some type of metadata about each of the fields, such as the following pseudocode data structure:
structure address-field-metadata
{
field-number, // corresponds to the enumeration above
field-index, // the order in which the field is usually displayed
field-name, // a "localized" name; US == "State", CA == "Province", etc
is-applicable, // whether or not the field is even looked at / valid
is-required, // whether or not the field is required
validation-regex, // an optional regex to apply against the field
allowed-values[] // an optional array of specific values the field can be set to
}
In fact, instead of having individual address objects for each country, we could take the slightly less object-oriented approach of having an Address object that eschews .NET properties and uses an AddressStrategy to determine formatting and validation rules:
object address
{
set-field(field-number, field-value),
address-strategy
}
object address-strategy
{
validate-field(field-number, field-value),
cleanse-address(address),
format-address(address, formatting-options)
}
When setting a field, that Address object would invoke the appropriate method on its internal AddressStrategy object.
The reason for using a SetField() method approach rather than properties with getters and setters is so that it is easier for code to actually set these fields in a generic way without resorting to reflection or switch statements.
You can imagine the process going something like this:
GUI code calls a factory method or some such to create an address based on a country. (The country dropdown, then, is the first thing that the customer selects, or has a good guess pre-selected for them based on culture info or IP address.)
GUI calls address.GetMetadata() or a similar method and receives a list of the AddressFieldMetadata structures as described above. It can use this metadata to determine what fields to display (ignoring those with is-applicable set to false), what to label those fields (using the field-name member), display those fields in a particular order, and perform cursory, presentation-level validation on that data (using the is-required, validation-regex, and allowed-values members).
GUI calls the address.SetField() method using the field-number (which corresponds to the enumeration above) and its given values. The Address object or its strategy can then perform some advanced address validation on those fields, invoke address cleaners, etc.
There could be slight variations on the above if we want to make the Address object itself behave like an immutable object once it is created. (Which I will probably try to do, since the Address object is really more like a data structure, and probably will never have any true behavior associated with itself.)
Does any of this make sense? Am I straying too far off of the OOP path? To me, this represents a pretty sensible compromise between being so abstract that implementation is nigh-impossible (xAL) versus being strictly US-biased.
Update 2 years later: I eventually ended up with a system similar to this and wrote about it at my defunct blog.
I feel like this solution is the right balance between legacy data and relational data storage, at least for the e-commerce world.

I'd use an Address table, as you've suggested, and I'd base it on the data tracked by xAL.

In the UK there is a product called PAF from Royal Mail
This gives you a unique key per address - there are hoops to jump through, though.

I basically see 2 choices if you want consistency:
Data cleansing
Basic data table look ups
Ad 1. I work with the SAS System, and SAS Institute offers a tool for data cleansing - this basically performs some checks and validations on your data, and suggests that "Abram Lincoln Road" and "Abraham Lincoln Road" be merged into the same street. I also think it draws on national data bases containing city-postal code matches and so on.
Ad 2. You build up a multiple choice list (ie basic data), and people adding new entries pick from existing entries in your basic data. In your fact table, you store keys to street names instead of the street names themselves. If you detect a spelling error, you just correct it in your basic data, and all instances are corrected with it, through the key relation.
Note that these options don't rule out each other, you can use both approaches at the same time.

In the US, I'd suggest choosing a National Change of Address vendor and model the DB after what they return.

The authorities on how addresses are constructed are generally the postal services, so for a start I would examine the data elements used by the postal services for the major markets you operate in.
See the website of the Universal Postal Union for very specific and detailed information on international postal address formats:http://www.upu.int/post_code/en/postal_addressing_systems_member_countries.shtml

"xAl is the closest thing to a global standard that popped up. It seems to be quite an overkill though, and I am not sure many people would want to implement it in their database..."
This is not a relevant argument. Implementing addresses is not a trivial task if the system needs to be "comprehensive and consistent" (i.e. worldwide). Implementing such a standard is indeed time consuming, but to meet the specified requirement nevertheless mandatory.

normalize your database schema and you'll have the perfect structure for correct consistency. and this is why:
http://weblogs.sqlteam.com/mladenp/archive/2008/09/17/Normalization-for-databases-is-like-Dependency-Injection-for-code.aspx

I asked something quite similar earlier: Dynamic contact information data/design pattern: Is this in any way feasible?.
The short answer: Storing adderres or any kind of contact information in a database is complex. The Extendible Address Language (xAL) link above has some interesting information that is the closest to a standard/best practice that I've come accross...

Related

A topic to experienced database architects [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I face the following problem.
I'm creating a database for (say) human beings' info. All the human beings may be classified in one of the three categories: adult female, adult male, child. It is clear that the parameters like "height" and "weight" are applicable to all of the categories. The parameter "number of children" is applicable only to adults, while the parameter "number of pregnancies" is applicable to females only. Also, each parameter may be classified as mandatory or optional depending on the category (for example, for adults the parameter "number of ex-partners" is optional).
When I load (say) "height" and "weight", I check whether the info in these two fields is self-consistent. I.e., I mark as a mistake the record which has height=6'4'' and weight=10 lb (obviously, this is physically impossible). I have several similar verification rules.
When I insert a record about a human being, I need to reflect the following characteristics of the info:
the maximum possible info for the category of this particular human being (including all the optional parameters).
the required minimum of information for the category (i.e., mandatory fields only)
what has actually been inserted for this particular human being (i.e., it is possible to insert whatever I have for this person no matter whether it is smaller than the amount of required minimum of info or not). The non-trivial issue here is that a field "XXX" may have NULL value because I have never inserted anything there OR because I have intentionally inserted exactly NULL value. The same logic with the fields that have a default value. So somewhere should be reflected that I have processed this particular field.
what amount of inserted information has been verified (i.e., even if I load some 5 fields, I can check for self-consistency only 3 fields while ignoring the 2 left).
So my question is how to technically organize it. Currently, all these required features are either hardcoded with no unified logic or broken into completely independent blocks. I need to create a unified approach.
I have some naive ideas in my head in this regard. For example, for each category of human beings, I can create and store a list of possible fields (I call it "template"). A can mark those fields that are mandatory.
When I insert a record about a human being, I copy the template and mark what fields from this templates have actually been processed. At the next stage, I can mark in this copy of the template those fields that will be currently verified.
The module of verification is specially corrected in the following way: for each verification procedure I create a list of fields that are being used in this particular verification procedure. Then I call only those verification procedures that have those fields that are actually marked "to be verified" in the copy of the template for the particular human being that is to be verified (see the previous passage).
As you see, this is the most straightforward way to solve this problem. But my guess is that there are a lot of quite standardized approaches that I'm not aware of. I really doubt that I'm the first in the world to solve such a problem. I don't like my solution because it is really painfull to write the code to correctly reflect in this copied template all the "updates" happening with a record.
So, I ask you to share your opinion how would you solve this problem.
I think there are two questions here:
how do I store polymorphic data in a database?
how do I validate complex business rules?
You should address them separately - trying to solve both at once is probably too hard.
There are a few approaches to polymorphic data in RDBMSes - ORMs use the term inheritance mapping, for instance. The three solutions here - table per class hierarchy, table per subclass and table per concrete class - are "pure" relational solutions. You can also use the "Entity-Attribute-Value" design, or use a document approach (storing data in structured formats such as XML or JSON) - these are not "pure" relational options, but have their place.
Validating complex business rules is often done using rule engines - these are super cool bits of technology, but you have to be sure that your problem really fits with their solution - deciding to invest in a rules engine means your project changes into a rules engine project, not a "humans" project. Alternatively, most mainstream solutions to this embody the business logic about the entities in the application's business logic layer. It sounds like you're outgrowing this.
This exact problem, both in health terms and in terms of a financial instrument, is used as a primary example in Martin Fowlers book Analysis Patterns. It is an extensive topic. As #NevilleK says you are trying to deal with two questions, and it is best to deal with them separately. One ultra simplified way of approaching these problems is:
1 Storage of polymorphic data - only put mandatory data that is common to the category in the category table. For optional data put these in a separate table in 1-1 relationship to the category table. Entries are made in these optional tables only if there is a value to be recorded. The record of the verification of the data can also be put in these additional tables.
2 Validate complex business rules - it is useful to consider the types of error that can arise. There are a number of ways of classifying the errors but the one I have found most useful is (a) type errors where one can tell that the value is in error just by looking at the data - eg 1980-02-30. (b) context errors where one can detect the error only by reference to previously captured date - eg DoB 1995-03-15, date of marriage 1996-08-26. and (c) lies to the system - where the data type is ok; the context is ok; but the information can only be detected as incorrect at a later date when more information comes to light eg if I register my DoB as 1990-12-31, when it is something different. This latter class of error typically has to be dealt with by procedures outside the system being developed.
I would use the Party Role pattern (Silverston):
Party
id
name
Individual : Party
current_weight
current_height
PartyRole
id
party_id
from_date
to_date (nullable)
AdultRole : PartyRole
number_of_children
FemaleAdultRole : AdultRole
number_of_pregnancies
Postgres has a temporal extension such that you could enforce that a party could only play one role at a time (yet maintain their role histories).
Use table inheritance. For simplicity use Single Table Inheritance (has nulls), for no nulls use Class Table Inheritance.

Which of these data models is the most correct?

I'm in the process of creating the data model for an application I will be developing, and I was hoping to get some feedback on part of the model. The app will be a complete redevelopment of something that was created in Lotus Notes, and one of the main purposes of the redevelopment is to move toward a relational data storage layer.
The application is focused on managing Things. The requirements/constraints of the application are:
A Thing must have an associated Location.
A Location could be for example 'McDonalds', or 'Melbourne Uni, Building AK, Room 301' where 'Melbourne Uni', 'Building AK', and 'Room 301' are seperate related Locations.
(at least) 3 levels/tiers of Location must exist
There must be a provision for 'Other' locations, so that users can enter free text for a location that does not exist in the database
So I've come up with 4 different implementations of the above, but I don't really have enough DBA experience to know which one is the most correct.
Location / Thing relational model
Any thoughts and/or suggestions on this would be greatly appreciated!
Both option 1s are likely to prove inflexible and difficult to amend if your estimate of three levels proves to be insufficient.
In your option 2s, the entity that looks dubious is ThingOtherLocations. Anything that (from its name) is concatenating two different concepts is automatically suspect.If it is the case that you do have two separate concepts here, then the structure of option 2b does not need either OtherLocation or ThingOtherLocation. I suspect that the relation you are trying to represent (gets its name from) is actually another relation between locations - though I am not clear on this.
EDIT
In the light of your clarification of the ThingOtherLocations, I would suggest that you treat the text associated with Other simply as a new location, and store the new location along with other locations. There does not seem to be any reason to include special database handling for these cases.
EDIT
To deal with the child location issue, you might like to consider Joe Celko's work on nested sets. The primary reference for this is:
Joe Celko.
Joe Celko's Trees and Hierarchies in SQL for Smarties,
(The Morgan Kaufmann Series in Data Management Systems)
ISBN 1-55860-920-2

Designing tables for storing various requirements and stats for multiplayer game

Original Question:
Hello,
I am creating very simple hobby project - browser based multiplayer game. I am stuck at designing tables for storing information about quest / skill requirements.
For now, I designed my tables in following way:
table user (basic information about users)
table stat (variety of stats)
table user_stats (connecting each user with stats)
Another example:
table monsters (basic information about npc enemies)
table monster_stats (connecting monsters with stats, using the same stat table from above)
Those were the simple cases. I must admit, that I am stuck while designing requirements for different things, e.g quests. Sample quest A might have only minimum character level requirement (and that is easy to implement) - but another one, quest B has multitude of other reqs (finished quests, gained skills, possessing specific items, etc) - what is a good way of designing tables for storing this kind of information?
In a similar manner - what is an efficient way of storing information about skill requirements? (specific character class, min level, etc).
I would be grateful for any help or information about creating database driven games.
Edit:
Thank You for the answers, yet I would like to receive more. As I am having some problems designing an rather complicated database layout for craftable items, I am starting a max bounty for this question.
I would like to receive links to articles / code snippets / anything connected with best practices of designing databases for storing game data (an good example of this kind of information is availibe on buildingbrowsergames.com).
I would be grateful for any help.
I'll edit this to add as many other pertinent issues as I can, although I wish the OP would address my comment above. I speak from several years as a professional online game developer and many more years as a hobbyist online game developer, for what it's worth.
Online games imply some sort of persistence, which means that you have broadly two types of data - one is designed by you, the other is created by the players in the course of play. Most likely you are going to store both in your database. Make sure you have different tables for these and cross-reference them properly via the usual database normalisation rules. (eg. If your player crafts a broadsword, you don't create an entire new row with all the properties of a sword. You create a new row in the player_items table with the per-instance properties, and refer to the broadsword row in the item_types table which holds the per-itemtype properties.) If you find a row of data is holding some things that you designed and some things that the player is changing during play, you need to normalise it out into two tables.
This is really the typical class/instance separation issue, and applies to many things in such games: a goblin instance doesn't need to store all the details of what it means to be a goblin (eg. green skin), only things pertinent to that instance (eg. location, current health). Some times there is a subtlety to the act of construction, in that instance data needs to be created based on class data. (Eg. setting a goblin instance's starting health based upon a goblin type's max health.) My advice is to hard-code these into your code that creates the instances and inserts the row for it. This information only changes rarely since there are few such values in practice. (Initial scores of depletable resources like health, stamina, mana... that's about it.)
Try and find a consistent terminology to separate instance data from type data - this will make life easier later when you're patching a live game and trying not to trash the hard work of your players by editing the wrong tables. This also makes caching a lot easier - you can typically cache your class/type data with impunity because it only ever changes when you, the designer, pushes new data up there. You can run it through memcached, or consider loading it all at start up time if your game has a continuous process (ie. is not PHP/ASP/CGI/etc), etc.
Remember that deleting anything from your design-side data is risky once you go live, since player-generated data may refer back to it. Test everything thoroughly locally before deploying to the live server because once it's up there, it's hard to take it down. Consider ways to be able to mark rows of such data as removed in a safe fashion - maybe a boolean 'live' column which, if set to false, means it just won't show up in the typical query. Think about the impact on players if you disable items they earned (and doubly if these are items they paid for).
The actual crafting side can't really be answered without knowing how you want to design your game. The database design must follow the game design. But I'll run through a trivial idea. Maybe you will want to be able to create a basic object and then augment it with runes or crystals or whatever. For that, you just need a one-to-many relationship between item instance and augmentation instance. (Remember, you might have item type and augmentation type tables too.) Each augmentation can specify a property of an item (eg. durability, max damage done in combat, weight) and a modifier (typically as a multiplier, eg. 1.1 to add a 10% bonus). You can see my explanation for how to implement these modifying effects here and here - the same principles apply for temporary skill and spell effects as apply for permanent item modification.
For character stats in a database driven game, I would generally advise to stick with the naïve approach of one column (integer or float) per statistic. Adding columns later is not a difficult operation and since you're going to be reading these values a lot, you might not want to be performing joins on them all the time. However, if you really do need the flexibility, then your method is fine. This strongly resembles the skill level table I suggest below: lots of game data can be modelled in this way - map a class or instance of one thing to a class or instance of other things, often with some additional data to describe the mapping (in this case, the value of the statistic).
Once you have these basic joins set up - and indeed any other complex queries that result from the separation of class/instance data in a way that may not be convenient for your code - consider creating a view or a stored procedure to perform them behind the scenes so that your application code doesn't have to worry about it any more.
Other good database practices apply, of course - use transactions when you need to ensure multiple actions happen atomically (eg. trading), put indices on the fields you search most often, use VACUUM/OPTIMIZE TABLE/whatever during quiet periods to keep performance up, etc.
(Original answer below this point.)
To be honest I wouldn't store the quest requirement information in the relational database, but in some sort of script. Ultimately your idea of a 'requirement' takes on several varying forms which could draw on different sorts of data (eg. level, class, prior quests completed, item possession) and operators (a level might be a minimum or a maximum, some quests may require an item whereas others may require its absence, etc) not to mention a combination of conjunctions and disjunctions (some quests require all requirements to be met, whereas others may only require 1 of several to be met). This sort of thing is much more easily specified in an imperative language. That's not to say you don't have a quest table in the DB, just that you don't try and encode the sometimes arbitrary requirements into the schema. I'd have a requirement_script_id column to reference an external script. I suppose you could put the actual script into the DB as a text field if it suits, too.
Skill requirements are suited to the DB though, and quite trivial given the typical game system of learning skills as you progress through levels in a certain class:
table skill_levels
{
int skill_id FOREIGN KEY;
int class_id FOREIGN KEY;
int min_level;
}
myPotentialSkillList = SELECT * FROM skill_levels INNER JOIN
skill ON skill_levels.skill_id = skill.id
WHERE class_id = my_skill
ORDER BY skill_levels.min_level ASC;
Need a skill tree? Add a column prerequisite_skill_id. And so on.
Update:
Judging by the comments, it looks like a lot of people have a problem with XML. I know it's cool to bash it now and it does have its problems, but in this case I think it works. One of the other reasons that I chose it is that there are a ton of libraries for parsing it, so that can make life easier.
The other key concept is that the information is really non-relational. So yes, you could store the data in any particular example in a bunch of different tables with lots of joins, but that's a pain. But if I kept giving you a slightly different examples I bet you'd have to modify your design ad infinitum. I don't think adding tables and modifying complicated SQL statements is very much fun. So it's a little frustrating that #scheibk's comment has been voted up.
Original Post:
I think the problem you might have with storing quest information in the database is that it isn't really relational (that is, it doesn't really fit easily into a table). That might be why you're having trouble designing tables for the data.
On the other hand, if you put your quest information directly into code, that means you'll have to edit the code and recompile each time you want to add a quest. Lame.
So if I was you I might consider storing my quest information in an XML file or something similar. I know that's the generic solution for just about anything, but in this case it sounds right to me. XML is really made for storing non-relation and/or hierarchical data, just like the stuff you need to store for your quest.
Summary: You could come up with your own schema, create your XML file, and then load it at run time somehow (or even store the XML in the database).
Example XML:
<quests>
<quest name="Return Ring to Mordor">
<characterReqs>
<level>60</level>
<finishedQuests>
<quest name="Get Double Cheeseburger" />
<quest name="Go to Vegas for the Weekend" />
</finishedQuests>
<skills>
<skill name="nunchuks" />
<skill name="plundering" />
</skills>
<items>
<item name="genie's lamp" />
<item name="noise cancelling headphones for robin williams' voice />
</items>
</characterReqs>
<steps>
<step number="1">Get to Mordor</step>
<step number="2">Throw Ring into Lava</step>
<step number="3">...</step>
<step number="4">Profit</step>
</steps>
</quest>
</quests>
It sounds like you're ready for general object oriented design (OOD) principles. I'm going to purposefully ignore the context (gaming, MMO, etc) because that really doesn't matter to how you do a design process. And me giving you links is less useful than explaining what terms will be most helpful to look up yourself, IMO; I'll put those in bold.
In OOD, the database schema comes directly from your system design, not the other way around. Your design will tell you what your base object classes are and which properties can live in the same table (the ones in 1:1 relationship with the object) versus which to make mapping tables for (anything with 1:n or n:m relationships - for exmaple, one user has multiple stats, so it's 1:n). In fact, if you do the OOD correctly, you will have zero decisions to make regarding the final DB layout.
The "correct" way to do any OO mapping is learned as a multi-step process called "Database Normalization". The basics of which is just as I described: find the "arity" of the object relationships (1:1, 1:n,...) and make mapping tables for the 1:n's and n:m's. For 1:n's you end up with two tables, the "base" table and a "base_subobjects" table (eg. your "users" and "user_stats" is a good example) with the "foreign key" (the Id of the base object) as a column in the subobject mapping table. For n:m's, you end up with three tables: "base", "subobjects", and "base_subobjects_map" where the map has one column for the base Id and one for the subobject Id. This might be necessary in your example for N quests that can each have M requirements (so the requirement conditions can be shared among quests).
That's 85% of what you need to know. The rest is how to handle inheritance, which I advise you to just skip unless you're masochistic. Now just go figure out how you want it to work before you start coding stuff up and the rest is cake.
The thread in #Shea Daniel's answer is on the right track: the specification for a quest is non-relational, and also includes logic as well as data.
Using XML or Lua are examples, but the more general idea is to develop your own Domain-Specific Language to encode quests. Here are a few articles about this concept, related to game design:
The Whimsy Of Domain-Specific Languages
Using a Domain Specific Language for Behaviors
Using Domain-Specific Modeling towards Computer Games Development Industrialization
You can store the block of code for a given quest into a TEXT field in your database, but you won't have much flexibility to use SQL to query specific parts of it. For instance, given the skills a character currently has, which quests are open to him? This won't be easy to query in SQL, if the quest prerequisites are encoded in your DSL in a TEXT field.
You can try to encode individual prerequisites in a relational manner, but it quickly gets out of hand. Relational and object-oriented just don't go well together. You can try to model it this way:
Chars <--- CharAttributes --> AllAttributes <-- QuestPrereqs --> Quests
And then do a LEFT JOIN looking for any quests for which no prereqs are missing in the character's attributes. Here's pseudo-code:
SELECT quest_id
FROM QuestPrereqs
JOIN AllAttributes
LEFT JOIN CharAttributes
GROUP BY quest_id
HAVING COUNT(AllAttributes) = COUNT(CharAttributes);
But the problem with this is that now you have to model every aspect of your character that could be a prerequisite (stats, skills, level, possessions, quests completed) as some kind of abstract "Attribute" that fits into this structure.
This solves this problem of tracking quest prerequisites, but it leaves you with another problem: the character is modeled in a non-relational way, essentially an Entity-Attribute-Value architecture which breaks a bunch of relational rules and makes other types of queries incredibly difficult.
Not directly related to the design of your database, but a similar question was asked a few weeks back about class diagram examples for an RPG
I'm sure you can find something useful in there :)
Regarding your basic structure, you may (depending on the nature of your game) want to consider driving toward convergence of representation between player character and non-player characters, so that code that would naturally operate the same on either doesn't have to worry about the distinction. This would suggest, instead of having user and monster tables, having a character table that represents everything PCs and NPCs have in common, and then a user table for information unique to PCs and/or user accounts. The user table would have a character_id foreign key, and you could tell a player character row by the fact that a user row exists corresponding to it.
For representing quests in a model like yours, the way I would do it would look like:
quest_model
===============
id
name ['Quest for the Holy Grail', 'You Killed My Father', etc.]
etc.
quest_model_req_type
===============
id
name ['Minimum Level', 'Skill', 'Equipment', etc.]
etc.
quest_model_req
===============
id
quest_id
quest_model_req_type_id
value [10 (for Minimum Level), 'Horseback Riding' (for Skill), etc.]
quest
===============
id
quest_model_id
user_id
status
etc.
So a quest_model is the core definition of the quest structure; each quest_model can have 0..n associated quest_model_req rows, which are requirements specific to that quest model. Every quest_model_req is associated with a quest_model_req_type, which defines the general type of requirement: achieving a Minimum Level, having a Skill, possessing a piece of Equipment, and so on. The quest_model_req also has a value, which configures the requirement for this specific quest; for example, a Minimum Level type requirement might have a value of 20, meaning you must be at least level 20.
The quest table, then, is individual instances of quests that players are undertaking or have undertaken. The quest is associated with a quest_model and a user (or perhaps character, if you ever want NPCs to be able to do quests!), and has a status indicating where the progress of the quest stands, and whatever other tracking turns out useful.
This is a bare-bones structure that would, of course, have to be built out to accomodate the needs of particular games, but it should illustrate the direction I'd recommend.
Oh, and since someone else threw around their credentials, mine are that I've been a hobbyist game developer on live, public-facing projects for 16 years now.
I'd be extremely careful of what you actually store in a DB, especially for an MMORPG. Keep in mind, these things are designed to be MASSIVE with thousands of users, and game code has to execute excessively quickly and send a crap-ton of data over the network, not only to the players on their home connections but also between servers on the back-end. You're also going to have to scale out eventually and databases and scaling out are not two things that I feel mix particularly well, particularly when you start sharding into different regions and then adding instance servers to your shards and so on. You end up with a whole lot of servers talking to databases and passing a lot of data, some of which isn't even relevant to the game at all (SQL text going to a SQL server is useless network traffic that you should cut down on).
Here's a suggestion: Limit your SQL database to storing only things that will change as players play the game. Monsters and monster stats will not change. Items and item stats will not change. Quest goals will not change. Don't store these things in a SQL database, instead store them in the code somewhere.
Doing this means that every server that ever lives will always know all of this information without ever having to query a database. Now, you don't store quests at all, you just store accomplishments of the player and the game programatically determines the affects of those quests being completed. You don't waste data transferring information between servers because you're only sending event ID's or something of that nature (you can optimize the data you pass by only using just enough bits to represent all the event ID's and this will cut down on network traffic. May seem insignificant but nothing is insignificant in massive network apps).
Do the same thing for monster stats and item stats. These things don't change during gameplay so there's no need to keep them in a DB at all and therefore this information NEVER needs to travel over the network. The only thing you store is the ID of the items or monster kills or anything like that which is non-deterministic (i.e. it can change during gameplay in a way which you can't predict). You can have dedicated item servers or monster stat servers or something like that and you can add those to your shards if you end up having huge numbers of these things that occupy too much memory, then just pass the data that's necessary for a particular quest or area to the instance server that is handling that thing to cut down further on space, but keep in mind that this will up the amount of data you need to pass down the network to spool up a new instance server so it's a trade-off. As long as you're aware of the consequences of this trade-off, you can use good judgement and decide what you want to do. Another possibility is to limit instance servers to a particular quest/region/event/whatever and only equip it with enough information to the thing it's responsible for, but this is more complex and potentially limits your scaling out since resource allocation will become static instead of dynamic (if you have 50 servers of each quest and suddenly everyone goes on the same quest, you'll have 49 idle servers and one really swamped server). Again, it's a trade-off so be sure you understand it and make good choices for your application.
Once you've identified exactly what information in your game is non-deterministic, then you can design a database around that information. That becomes a bit easier: players have stats, players have items, players have skills, players have accomplishments, etc, all fairly easy to map out. You don't need descriptions for things like skills, accomplishments, items, etc, or even their effects or names or anything since the server can determine all that stuff for you from the ID's of those things at runtime without needing a database query.
Now, a lot of this probably sounds like overkill to you. After all, a good database can do queries very rapidly. However, your bandwidth is extremely precious, even in the data center, so you need to limit your use of it to only what is absolutely necessary to send and only send that data when it's absolutely necessary that it be sent.
Now, for representing quests in code, I would consider the specification pattern (http://en.wikipedia.org/wiki/Specification_pattern). This will allow you to easily build up quest goals in terms of what events are needed to ensure that the specification for completing that quest is met. You can then use LUA (or something) to define your quests as you build the game so that you don't have to make massive code changes and rebuild the whole damn thing to make it so that you have to kill 11 monsters instead of 10 to get the Sword of 1000 truths in a particular quest. How to actually do something like that I think is beyond the scope of this answer and starts to hit the edge of my knowledge of game programming so maybe someone else on here can help you out if you choose to go that route.
Also, I know I used a lot of terms in this answer, please ask if there are any that you are unfamiliar with and I can explain them.
Edit: didn't notice your addition about craftable items. I'm going to assume that these are things that a player can create specifically in the game, like custom items. If a player can continually change these items, then you can just combine the attributes of what they're crafted as at runtime but you'll need to store the ID of each attribute in the DB somewhere. If you make a finite number of things you can add on (like gems in Diablo II) then you can eliminate a join by just adding that number of columns to the table. If there are a finite number of items that can be crafted and a finite number of ways that differnet things can be joined together into new items, then when certain items are combined, you needn't store the combined attributes; it just becomes a new item which has been defined at some point by you already. Then, they just have that item instead of its components. If you clarify the behavior your game is to have I can add additional suggestions if that would be useful.
I would approach this from an Object Oriented point of view, rather than a Data Centric point of view. It looks like you might have quite a lot of (poss complex) objects - I would recommend getting them modeled (with their relationships) first, and relying on an ORM for persistence.
When you have a data-centric problem, the database is your friend. What you have done so far seems to be quite right.
On the other hand, the other problems you mention seem to be behaviour-centric. In this case, an object-oriented analisys and solution will work better.
For example:
Create a quest class with specificQuest child classes. Each child should implement a bool HasRequirements(Player player) method.
Another option is some sort of rules engine (Drools, for example if you are using Java).
If i was designing a database for such a situation, i might do something like this:
Quest
[quest properties like name and description]
reqItemsID
reqSkillsID
reqPlayerTypesID
RequiredItems
ID
item
RequiredSkills
ID
skill
RequiredPlayerTypes
ID
type
In this, the ID's map to the respective tables then you retrieve all entries under that ID to get the list of required items, skills, what have you. If you allow dynamic creation of items then you should have a mapping to another table that contains all possible items.
Another thing to keep in mind is normalization. There's a long article here but i've condensed the first three levels into the following more or less:
first normal form means that there are no database entries where a specific field has more than one item in it
second normal form means that if you have a composite primary key all other fields are fully dependent on the entire key not just parts of it in each table
third normal is where you have no non-key fields that are dependent on other non-key fields in any table
[Disclaimer: i have very little experience with SQL databases, and am new to this field. I just hope i'm of help.]
I've done something sort of similar and my general solution was to use a lot of meta data. I'm using the term loosely to mean that any time I needed new data to make a given decision(allow a quest, allow using an item etc.) I would create a new attribute. This was basically just a table with an arbitrary number of values and descriptions. Then each character would have a list of these types of attributes.
Ex: List of Kills, Level, Regions visited, etc.
The two things this does to your dev process are:
1) Every time there's an event in the game you need to have a big old switch block that checks all these attribute types to see if something needs updating
2) Everytime you need some data, check all your attribute tables BEFORE you add a new one.
I found this to be a good rapid development strategy for a game that grows organically(not completely planned out on paper ahead of time) - but it's one big limitation is that your past/current content(levels/events etc) will not be compatible with future attributes - i.e. that map won't give you a region badge because there were no region badges when you coded it. This of course requires you to update past content when new attributes are added to the system.
just some little points for your consideration :
1) Always Try to make your "get quest" requirements simple.. and "Finish quest" requirements complicated..
Part1 can be done by "trying to make your quests in a Hierarchical order":
example :
QuestA : (Kill Raven the demon) (quest req: Lvl1)
QuestA.1 : Save "unkown" in the forest to obtain some info.. (quest req : QuestA)
QuestA.2 : Craft the sword of Crystal ... etc.. (quest req : QuestA.1 == Done)
QuestA.3 : ... etc.. (quest req : QuestA.2 == Done)
QuestA.4 : ... etc.. (quest req : QuestA.3 == Done)
etc...
QuestB (Find the lost tomb) (quest req : ( QuestA.statues == Done) )
QuestC (Go To the demons Hypermarket) ( Quest req: ( QuestA.statues == Done && player.level== 10)
etc....
Doing this would save you lots of data fields/table joints.
ADDITIONAL THOUGHTS:
if you use the above system, u can add an extra Reward field to ur quest table called "enableQuests" and add the name of the quests that needs to be enabled..
Logically.. you'd have an "enabled" field assigned to each quest..
2) A minor solution for Your crafting problem, create crafting recipes, Items that contains To-be-Crafted-item crafting requirements stored in them..
so when a player tries to craft an item.. he needs to buy a recipe 1st.. then try crafting..
a simple example of such item Desc would be:
ItemName: "Legendary Sword of the dead"
Craftevel req. : 75
Items required:
Item_1 : Blade of the dead
Item_2 : A cursed seal
item_3 : Holy Gemstone of the dead
etc...
and when he presses the "craft" Action, you can parse it and compare against his inventory/craft box...
so Your Crafting DB will have only 1 field (or 2 if u want to add a crafting LvL req. , though it will already be included in the recipe.
ADDITIONAL THOUGHTS:
Such items, can be stored in xml format in the table .. which would make it much easier to parse...
3) A similar XML System can be applied to Your quest system.. to implement quest-ending requirements..

Personal names in a global application: What to store [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
Storing personal names in a structured way seems quite difficult when it comes to an application which is used by users from lots of different countries. The application I'm working on could theoretically be used by anyone from any place in the world.
Most often a given name (first name / forename) and surname seems to be used. In which case those two could simply be stored in the user database table.
Is storing "given name" and "surname" in the user table enough for a globally used application? Please give your opinion with a motivation.
Do you have any other suggestions?
Are there any good guides on how to solve this?
Some important facts:
Communication between users (who reside at the same or different companies and may be in different countries).
It's important that searching for users by name feels natural to users and that all important parts of a person name are searchable.
It would be nice if, when sending a message to someone in another country the system should be able to help by suggesting a proper greeting. It will probably be hard for arabic names, at least from what I've read as they seem to have a complex structure.
There isn't really a universal structured way to do this. I'd have a big field for "Full Name" and another field for "Display Name". Both unicode.
For example, in Spanish-speaking countries, IIRC, people usually have FOUR names. Two given names, and two surnames (one from the father, one from the mother). Arabs essentially have a linked list of names as far back as they choose to go (So-and-so, son of so-and-so, son of so-and-so, ...). East Asian countries tend to put the given names last, whilst Europeans put the given names first.
If you are really interested in going global across all cultures, take a look at the HR-XML specification of Person. Given name & surname just doesn't cut it when you move outside of the West. Eastern standards of FamilyName GivenName will trip you up on defining FullName. Besides, you have all the potential complexities of alternate scripts (not everybody uses the Latin alphabet, y'know), prefixes and suffixes (NN Sr, van der Waals).
It's a standard intended for transmission and integration rather than storage, but don't let the XML schema syntax scare you. I wouldn't implement every aspect of it, but it's an excellent provider of corner cases which aren't immediately obvious, and which you can then ignore consciously. In particular, check out the examples at the end!
And for goodness' sake, don't create an American application which assumes a middle initial for everyone!
In general, a name is a human-readable identifier of a person. For a given person, you will need to store one name for each use case of such an identifier. You will need a name to display as part of the postal address; you may need one to use as a "screen name"; you may need one to use in the openning of a letter to the person; you may sometimes need to include titles and honors of the person, sometimes not.
In any of these cases, what you want to display is a single string. You may or may not be able to avoid duplication of effort by storing components of these strings and putting them back together later. But in general, both in terms of national culture and professional culture, you may be better off just storing the full strings.
The same, BTW, largely occurs for postal addresses, except that there are international standards that permit the postal authority of one country to send mail to persons in another country.
One note: don't require both a "first name" and a "last name".
Some people, like me, only have one name.
(Proof: http://saizai.com/dl_redacted_small.png)

What is the "best" way to store international addresses in a database?

What is the "best" way to store international addresses in a database? Answer in the form of a schema and an explanation of the reasons why you chose to normalize (or not) the way you did. Also explain why you chose the type and length of each field.
Note: You decide what fields you think are necessary.
Plain freeform text.
Validating all the world's post/zip codes is too hard; a fixed list of countries is too politically sensitive; mandatory state/region/other administrative subdivision is just plain inappropriate (all too often I'm asked which county I live in--when I don't, because Greater London is not a county at all).
More to the point, it's simply unnecessary. Your application is highly unlikely to be modelling addresses in any serious way. If you want a postal address, ask for the postal address. Most people aren't so stupid as to put in something other than a postal address, and if they do, they can kiss their newly purchased item bye-bye.
The exception to this is if you're doing something that's naturally constrained to one country anyway. In this situation, you should ask for, say, the { postcode, house number } pair, which is enough to identify a postal address. I imagine you could achieve similar things with the extended zip code in the US.
In the past I've modeled forms that needed to be international after the ups/fedex shipping address forms on their websites (I figured if they don't know how to handle an international order we are all hosed). The fields they use can be used as reference for setting up your schema.
In general, you need to understand why you want an address. Is it for shipping/mailing? Then there is really only one requirement, have the country separate. The other lines are freeform, to be filled in by the user. The reason for this is the common forwarding strategy for mail : any incoming mail for a foreign country is forwarded without looking at the other address lines. Hence, the detailed information is parsed only by the mail sorter located in the country itself. Like the receiver, they'll be familiar with national conventions.
(UPS may bunch together some small European countries, e.. all the Low Countries are probably served from Belgium - the idea still holds.)
I think adding country/city and address text will be fine. country and city should be separate for reporting. Managers always ask for these kind of reports which you do not expect and I dont prefer running a LIKE query through a large database.
Not to give Facebook undue respect. However, the overall structure of the database seems to be overlooked in many web applications launching every day. Obviously I don't think there is a perfect solution that covers all the potential variables with address structure without some hard work. That said, combined with autocomplete Facebook manages to take location input data and eliminate a majority of their redundant entries. They do this by organizing their database well enough to provide autocomplete information in a low cost, low error way to the client in real time allowing them to more or less choose the correct location from an existing list.
I think the best solution is to access a third party database which contains your desired geographic scope and use it to initially seed your user location information. This will allow you to avoid doing the groudwork of creating your own. With any luck you can reduce the load on your server by allowing your new users to receive the correct autocomplete information directly off your third party supplier. Eventually you will be able to fill most autocomplete for location information such as city, country, etc. from information contained in your own database from user input data.
You need to provide a bit more details about how you are planning to use the data. For example, fields like City, State, Country can either be text in the single table, or be codes which are linked to a separate table with a Foreign Key.
Simplest would be
Address_Line_01 (Required, Non blank)
Address_Line_02
Address_Line_03
Landmark
City (Required)
Pin (Required)
Province_District
State (Required)
Country (Required)
All the above can be Text/Unicode with appropriate field lengths.
Phone Numbers as applicable.

Resources