Design - Dynamic data mapping [closed] - database

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am working on an online tool that serves to a number of merchants(e.g. lets say retail merchants).This application takes data from different merchants and provides some data on their retail shop. The solution that I am trying to incorporate here is that any merchant can signup for the tool, send (may be upload through excel or my application can input a json object) their transaction and inventory data and in turn return the result to merchant.
My application consist of domain that is intrinsic to the application and contain all the datapoints that can be used by merchants, e.g
Product {
productId,
productName,
...
}
But the problem that I am facing is that, each merchant will have their own way of representing data, for e.g. merchant x may call product as prod or merchant y may call product
as proddt.
Now I would need to way to convert data represented in merchant format to a way that application understand, i.e each time there is a request from merchant x, application should map prod to product e.t.c e.t.c.
Firstly I was thinking of coding these mappers but then this is not a viable solution as I can't really code these mappings for 1000's of merchants that may join my application.
Another solution I was think was to enable the merchant to map a field from their domain to application domain through UI. And then save this somewhere in DB and on each request from merchant first find the mapping from db and then apply it over any incoming request.(Though I am still confused how this can be done).
Does anyone has faced similar design issue before and know of the better way of solving this problem.

if you can find the order of fields then you can easily map data send by your client and you can return result. for example in Excel you client can mention data in this format:
product | name | quantity | cost
condition: your ALL client should send data in this format.
then it will be easy for you to map these field and access then with correct DTO and later save and process data.

I appreciate this "language" concern, and -in fact- multi-lingual applications do it the way you describe. You need to standardize your terminology at your end, so that each term has only one meaning and only one word/term to describe it. You could even use mnemonics for that, e.g. for "favourite product" you use "Fav_Prod" in your app and in your DB. Then, when you present data to you customer, your app looks-up their preferred term for it in a look-up-table, and uses "favourite product" for customer one, and perhaps the admin, and then "favr prod" for customer two, etc...
Look at SQL and DB design, you'll find that this is a form of normalization.
Are you dealing with legacy systems and/or APIs at the customer end? If so, someone will indeed have to type in the data.
If you have 1000s of customers, but there are only 10..50 terms, it may best to let the customer, not you, set the terms.
You might be lucky, and be able to cluster customers together who use similar or close enough terminology. For new customers you could offer them a menu of terms that they can choose from.

If merchants were required to input their mapping with their data, your tool would not require a DB. In JSON, the input could be like the following:
input = {mapping: {ID: "productId", name: "productName"}, data: {productId: 0, productName: ""}}
Then, you could convert data represented in any merchant's format to your tool's format as follows:
ID = input.data[input.mapping.ID]
name = input.data[input.mapping.name]

To reacp:
You have an application
You want to load client data (merchants in this case) into your application
Your clients “own” and manage this data
There are N ways in which such client data can be managed, where N <= the number of possible clients
You will run out of money and your business will close before you can build support for all N models
Unless you are Microsoft, or Amazon, or Facebook, or otherwise have access to significant resources (time, people, money)
This may seem harsh, but it is pragmatic. You should assume NOTHING about how potential clients will be storing their data. Get anything wrong, your product will process and return bad results, and you will lose that client. Unless your clients are using the same data management tools—and possibly even then—their data structures and formats will differ, and could differ significantly.
Based on my not entirely limited experience, I see three possible ways to handle this.
1) Define your own way of modeling data. Require your customers to provide data in this format. Accept that this will limit your customer base.
2) Identify the most likely ways (models) your potential clients will be storing data (e.g. most common existing software systems they might be using for this.) Build import structures, formats to suppor these models. This, too, will limit your customer base.
3) Start with either of the above. Then, as part of your business model, agree to build out your system to support clients who sign up. If you already support their data model, great! If not, you will have to build it out. Maybe you can work the expense of this into what you charge them, maybe not. Your customer base will be limited by how efficiently you can add new and functional means of loading data to your system.

Related

How do I structure multiple Identity Data in a database

Am designing a database for a credit bureau and am seeking some guidance.
The data they receive from Banks, MFIs, Saccos, Utility companies etc comes with various types of IDs. E.g. It is perfectly legal to open a bank account with a National ID and also a Passport. Scenario One that has my head banging is that Customer1 will take a credit facility (call it loan for now) in bank1 with the passport and then go to bank2 and take another loan with their NationalID and Bank3 with their MilitaryID. Eventually when this data comes from the banks to the bureau, it would be seen as 3 different people while we know that its actually 1 person. At this point, there is nothing we can do as a bureau.
However, one way out (for now) is using the Govt registry which provides a repository which holds both passports and IDS. So once we query for this information and get a response, how do I show in the DB that Passport_X is related to NationalID_Y and MilitaryNumber_Z?
Again, a person's name could be captured in various orders states. Bank1 could do FName, LName, OName while Bank3 can do LName, FName only. How do I store this names?
Even against one ID type e.g. NationalID, you will often find misspellt names or missing names. So one NationalID in our database could end up with about 6 different names because the person's name was captured different by the various banks where he has transacted.
And that is just the tip of the iceberg. We have issues with addresses, telephone numbers, etc etc.
Could you have any insight as to how I'd structure my database to ensure we capture all data from all banks and provide the most accurate information possible regarding an individual? Better yet, do you have experience with this type of setup?
Thanks.
how do I show in the DB that Passport_X is related to NationalID_Y and MilitaryNumber_Z?
Trivial.
You ahve an identity table, that has an AlternateId field if the Identity is linked to another one. Use the first IDentity you created as master. Any alternative will have AlternateId pointing to it.
You need to separate the identity from the data in it, so you can have alterante versions of it, possibly with an origin and timestampt. You need oto likely fully support versioning and tying different identities to each other as alternative, including generating a "master identity" possibly by algorithm with the "official" version of your data (i.e. consolidated).
The details are complex - mostly you ahve to make a LOT of compromises without killing performance, so at the end HIRE A SPECIALIST. There is a reason there are people out as sensior database designers or architects that have 20+ years experience finding the optimal solution given the constrints you may not even be aware of (application wise).
Better yet, do you have experience with this type of setup?
Yes. Try financial information. Stock symbols / feeds / definitions are not necessariyl compatible and vary by whom you get it. Any non-trivial setup has different data feeds that may show the same item slightly different, sometimes in error. DIfferent name, sometimes different price (example: ES, CME group, is 50 USD per point, but on TT Fix it is 5 - to make up, the price is multiplied by 10, so instad of 1000.25 you get 10002.5). THis is the same line of consolidation, and it STINKS.
Tons of code, tons of proper database design, redoing it half a dozen time to get the proper performance. THis is tricky, sadly.

"Parametrized" database model & backend storage system as well as data mining manipulation

I have implicitly made this a community wiki seeing that the answers can be quite broad.
I'm working with a start-up company to accomplish the following goal.
In a medical research, a patient medical record can have infinite amount of data regarding a patient for a specific diagnosis, e.g. a smoker has a higher chance of catching lung cancer but that doesn't necessarily mean that a non-smoker can catch lung cancer. My goal is to create/use a database model that can deal with such parameters.
Now, I also have to come up with ways to data mine these parametrized data to create statistical data e.g. see the trends on all 40 year old female who suffered from lung cancer. That report can be generic, (graph, tabular, etc.) where doctors can see trends or analyse possible solutions that can work....
My questions are:
1) Which Database systems allows for parametrized backend storage (e.g. Cassandra) that can easily be used in java, and is very efficient in data retrieval, linkage, etc. We are dealing with high amount of patient records per states.
2) What algorithms or AI techniques can I use for data mining? Is there any mining techniques out there that can help me do this?
PS How does Google Analytics deal with parametrised data?
PPS A parametrized data is data which has a key, and data where data can be value, another key-value pair, a list of value, a set of parametrized data (organized, unorganized)
I'm looking forward for suggestive answers! :-D
I'll try to answer your first question only.
Cassandra is a key-value datastore (in your case parametrized). If you use Cassandra, you need higher computation time to derive complex reports. The reason being - it stores data in raw format. Cassandra like NOSQL databases are good if you want to scale very very big. They are eventually consistent and compromise on data replication and latency.
In your case as a patient can have data in infinitely any form, try to fit the model of a Triple Store (Semantic Web frameworks like Jena, OpenSesame, etc). They allow you to have a lousy data structures and can be molded at runtime. Also, their querying engines (SPARQL, SeRQL) give you more power than NOSQL stores (like Cassandra), but these querying capabilities are obviously lesser than RDBMS.
For this question, this is how we have implemented this.
We created a keyspace called medical and a supercolumn family called patient.
under the supercolumn family, we have a general supercolumn which basically store the patient details, and another supercolumn called operation to keep recording of the user occupation.
Don't forget that the general supercolumn keeps record of the patient as he/she comes to the doctor. That way, we know exactly the patient's exact condition before, during and after operation.
I know some data can be duplicates, but no supercolumns can be identical as there is no way that you can have exactly 2 different patient of identical attributes and sickness.
So basically, Cassandra allows 3 layers of abstraction, Keyspace, Column/Supercolumn family, Column/Supercolumn.
Hope this can help somebody.

Designing tables for storing various requirements and stats for multiplayer game

Original Question:
Hello,
I am creating very simple hobby project - browser based multiplayer game. I am stuck at designing tables for storing information about quest / skill requirements.
For now, I designed my tables in following way:
table user (basic information about users)
table stat (variety of stats)
table user_stats (connecting each user with stats)
Another example:
table monsters (basic information about npc enemies)
table monster_stats (connecting monsters with stats, using the same stat table from above)
Those were the simple cases. I must admit, that I am stuck while designing requirements for different things, e.g quests. Sample quest A might have only minimum character level requirement (and that is easy to implement) - but another one, quest B has multitude of other reqs (finished quests, gained skills, possessing specific items, etc) - what is a good way of designing tables for storing this kind of information?
In a similar manner - what is an efficient way of storing information about skill requirements? (specific character class, min level, etc).
I would be grateful for any help or information about creating database driven games.
Edit:
Thank You for the answers, yet I would like to receive more. As I am having some problems designing an rather complicated database layout for craftable items, I am starting a max bounty for this question.
I would like to receive links to articles / code snippets / anything connected with best practices of designing databases for storing game data (an good example of this kind of information is availibe on buildingbrowsergames.com).
I would be grateful for any help.
I'll edit this to add as many other pertinent issues as I can, although I wish the OP would address my comment above. I speak from several years as a professional online game developer and many more years as a hobbyist online game developer, for what it's worth.
Online games imply some sort of persistence, which means that you have broadly two types of data - one is designed by you, the other is created by the players in the course of play. Most likely you are going to store both in your database. Make sure you have different tables for these and cross-reference them properly via the usual database normalisation rules. (eg. If your player crafts a broadsword, you don't create an entire new row with all the properties of a sword. You create a new row in the player_items table with the per-instance properties, and refer to the broadsword row in the item_types table which holds the per-itemtype properties.) If you find a row of data is holding some things that you designed and some things that the player is changing during play, you need to normalise it out into two tables.
This is really the typical class/instance separation issue, and applies to many things in such games: a goblin instance doesn't need to store all the details of what it means to be a goblin (eg. green skin), only things pertinent to that instance (eg. location, current health). Some times there is a subtlety to the act of construction, in that instance data needs to be created based on class data. (Eg. setting a goblin instance's starting health based upon a goblin type's max health.) My advice is to hard-code these into your code that creates the instances and inserts the row for it. This information only changes rarely since there are few such values in practice. (Initial scores of depletable resources like health, stamina, mana... that's about it.)
Try and find a consistent terminology to separate instance data from type data - this will make life easier later when you're patching a live game and trying not to trash the hard work of your players by editing the wrong tables. This also makes caching a lot easier - you can typically cache your class/type data with impunity because it only ever changes when you, the designer, pushes new data up there. You can run it through memcached, or consider loading it all at start up time if your game has a continuous process (ie. is not PHP/ASP/CGI/etc), etc.
Remember that deleting anything from your design-side data is risky once you go live, since player-generated data may refer back to it. Test everything thoroughly locally before deploying to the live server because once it's up there, it's hard to take it down. Consider ways to be able to mark rows of such data as removed in a safe fashion - maybe a boolean 'live' column which, if set to false, means it just won't show up in the typical query. Think about the impact on players if you disable items they earned (and doubly if these are items they paid for).
The actual crafting side can't really be answered without knowing how you want to design your game. The database design must follow the game design. But I'll run through a trivial idea. Maybe you will want to be able to create a basic object and then augment it with runes or crystals or whatever. For that, you just need a one-to-many relationship between item instance and augmentation instance. (Remember, you might have item type and augmentation type tables too.) Each augmentation can specify a property of an item (eg. durability, max damage done in combat, weight) and a modifier (typically as a multiplier, eg. 1.1 to add a 10% bonus). You can see my explanation for how to implement these modifying effects here and here - the same principles apply for temporary skill and spell effects as apply for permanent item modification.
For character stats in a database driven game, I would generally advise to stick with the naïve approach of one column (integer or float) per statistic. Adding columns later is not a difficult operation and since you're going to be reading these values a lot, you might not want to be performing joins on them all the time. However, if you really do need the flexibility, then your method is fine. This strongly resembles the skill level table I suggest below: lots of game data can be modelled in this way - map a class or instance of one thing to a class or instance of other things, often with some additional data to describe the mapping (in this case, the value of the statistic).
Once you have these basic joins set up - and indeed any other complex queries that result from the separation of class/instance data in a way that may not be convenient for your code - consider creating a view or a stored procedure to perform them behind the scenes so that your application code doesn't have to worry about it any more.
Other good database practices apply, of course - use transactions when you need to ensure multiple actions happen atomically (eg. trading), put indices on the fields you search most often, use VACUUM/OPTIMIZE TABLE/whatever during quiet periods to keep performance up, etc.
(Original answer below this point.)
To be honest I wouldn't store the quest requirement information in the relational database, but in some sort of script. Ultimately your idea of a 'requirement' takes on several varying forms which could draw on different sorts of data (eg. level, class, prior quests completed, item possession) and operators (a level might be a minimum or a maximum, some quests may require an item whereas others may require its absence, etc) not to mention a combination of conjunctions and disjunctions (some quests require all requirements to be met, whereas others may only require 1 of several to be met). This sort of thing is much more easily specified in an imperative language. That's not to say you don't have a quest table in the DB, just that you don't try and encode the sometimes arbitrary requirements into the schema. I'd have a requirement_script_id column to reference an external script. I suppose you could put the actual script into the DB as a text field if it suits, too.
Skill requirements are suited to the DB though, and quite trivial given the typical game system of learning skills as you progress through levels in a certain class:
table skill_levels
{
int skill_id FOREIGN KEY;
int class_id FOREIGN KEY;
int min_level;
}
myPotentialSkillList = SELECT * FROM skill_levels INNER JOIN
skill ON skill_levels.skill_id = skill.id
WHERE class_id = my_skill
ORDER BY skill_levels.min_level ASC;
Need a skill tree? Add a column prerequisite_skill_id. And so on.
Update:
Judging by the comments, it looks like a lot of people have a problem with XML. I know it's cool to bash it now and it does have its problems, but in this case I think it works. One of the other reasons that I chose it is that there are a ton of libraries for parsing it, so that can make life easier.
The other key concept is that the information is really non-relational. So yes, you could store the data in any particular example in a bunch of different tables with lots of joins, but that's a pain. But if I kept giving you a slightly different examples I bet you'd have to modify your design ad infinitum. I don't think adding tables and modifying complicated SQL statements is very much fun. So it's a little frustrating that #scheibk's comment has been voted up.
Original Post:
I think the problem you might have with storing quest information in the database is that it isn't really relational (that is, it doesn't really fit easily into a table). That might be why you're having trouble designing tables for the data.
On the other hand, if you put your quest information directly into code, that means you'll have to edit the code and recompile each time you want to add a quest. Lame.
So if I was you I might consider storing my quest information in an XML file or something similar. I know that's the generic solution for just about anything, but in this case it sounds right to me. XML is really made for storing non-relation and/or hierarchical data, just like the stuff you need to store for your quest.
Summary: You could come up with your own schema, create your XML file, and then load it at run time somehow (or even store the XML in the database).
Example XML:
<quests>
<quest name="Return Ring to Mordor">
<characterReqs>
<level>60</level>
<finishedQuests>
<quest name="Get Double Cheeseburger" />
<quest name="Go to Vegas for the Weekend" />
</finishedQuests>
<skills>
<skill name="nunchuks" />
<skill name="plundering" />
</skills>
<items>
<item name="genie's lamp" />
<item name="noise cancelling headphones for robin williams' voice />
</items>
</characterReqs>
<steps>
<step number="1">Get to Mordor</step>
<step number="2">Throw Ring into Lava</step>
<step number="3">...</step>
<step number="4">Profit</step>
</steps>
</quest>
</quests>
It sounds like you're ready for general object oriented design (OOD) principles. I'm going to purposefully ignore the context (gaming, MMO, etc) because that really doesn't matter to how you do a design process. And me giving you links is less useful than explaining what terms will be most helpful to look up yourself, IMO; I'll put those in bold.
In OOD, the database schema comes directly from your system design, not the other way around. Your design will tell you what your base object classes are and which properties can live in the same table (the ones in 1:1 relationship with the object) versus which to make mapping tables for (anything with 1:n or n:m relationships - for exmaple, one user has multiple stats, so it's 1:n). In fact, if you do the OOD correctly, you will have zero decisions to make regarding the final DB layout.
The "correct" way to do any OO mapping is learned as a multi-step process called "Database Normalization". The basics of which is just as I described: find the "arity" of the object relationships (1:1, 1:n,...) and make mapping tables for the 1:n's and n:m's. For 1:n's you end up with two tables, the "base" table and a "base_subobjects" table (eg. your "users" and "user_stats" is a good example) with the "foreign key" (the Id of the base object) as a column in the subobject mapping table. For n:m's, you end up with three tables: "base", "subobjects", and "base_subobjects_map" where the map has one column for the base Id and one for the subobject Id. This might be necessary in your example for N quests that can each have M requirements (so the requirement conditions can be shared among quests).
That's 85% of what you need to know. The rest is how to handle inheritance, which I advise you to just skip unless you're masochistic. Now just go figure out how you want it to work before you start coding stuff up and the rest is cake.
The thread in #Shea Daniel's answer is on the right track: the specification for a quest is non-relational, and also includes logic as well as data.
Using XML or Lua are examples, but the more general idea is to develop your own Domain-Specific Language to encode quests. Here are a few articles about this concept, related to game design:
The Whimsy Of Domain-Specific Languages
Using a Domain Specific Language for Behaviors
Using Domain-Specific Modeling towards Computer Games Development Industrialization
You can store the block of code for a given quest into a TEXT field in your database, but you won't have much flexibility to use SQL to query specific parts of it. For instance, given the skills a character currently has, which quests are open to him? This won't be easy to query in SQL, if the quest prerequisites are encoded in your DSL in a TEXT field.
You can try to encode individual prerequisites in a relational manner, but it quickly gets out of hand. Relational and object-oriented just don't go well together. You can try to model it this way:
Chars <--- CharAttributes --> AllAttributes <-- QuestPrereqs --> Quests
And then do a LEFT JOIN looking for any quests for which no prereqs are missing in the character's attributes. Here's pseudo-code:
SELECT quest_id
FROM QuestPrereqs
JOIN AllAttributes
LEFT JOIN CharAttributes
GROUP BY quest_id
HAVING COUNT(AllAttributes) = COUNT(CharAttributes);
But the problem with this is that now you have to model every aspect of your character that could be a prerequisite (stats, skills, level, possessions, quests completed) as some kind of abstract "Attribute" that fits into this structure.
This solves this problem of tracking quest prerequisites, but it leaves you with another problem: the character is modeled in a non-relational way, essentially an Entity-Attribute-Value architecture which breaks a bunch of relational rules and makes other types of queries incredibly difficult.
Not directly related to the design of your database, but a similar question was asked a few weeks back about class diagram examples for an RPG
I'm sure you can find something useful in there :)
Regarding your basic structure, you may (depending on the nature of your game) want to consider driving toward convergence of representation between player character and non-player characters, so that code that would naturally operate the same on either doesn't have to worry about the distinction. This would suggest, instead of having user and monster tables, having a character table that represents everything PCs and NPCs have in common, and then a user table for information unique to PCs and/or user accounts. The user table would have a character_id foreign key, and you could tell a player character row by the fact that a user row exists corresponding to it.
For representing quests in a model like yours, the way I would do it would look like:
quest_model
===============
id
name ['Quest for the Holy Grail', 'You Killed My Father', etc.]
etc.
quest_model_req_type
===============
id
name ['Minimum Level', 'Skill', 'Equipment', etc.]
etc.
quest_model_req
===============
id
quest_id
quest_model_req_type_id
value [10 (for Minimum Level), 'Horseback Riding' (for Skill), etc.]
quest
===============
id
quest_model_id
user_id
status
etc.
So a quest_model is the core definition of the quest structure; each quest_model can have 0..n associated quest_model_req rows, which are requirements specific to that quest model. Every quest_model_req is associated with a quest_model_req_type, which defines the general type of requirement: achieving a Minimum Level, having a Skill, possessing a piece of Equipment, and so on. The quest_model_req also has a value, which configures the requirement for this specific quest; for example, a Minimum Level type requirement might have a value of 20, meaning you must be at least level 20.
The quest table, then, is individual instances of quests that players are undertaking or have undertaken. The quest is associated with a quest_model and a user (or perhaps character, if you ever want NPCs to be able to do quests!), and has a status indicating where the progress of the quest stands, and whatever other tracking turns out useful.
This is a bare-bones structure that would, of course, have to be built out to accomodate the needs of particular games, but it should illustrate the direction I'd recommend.
Oh, and since someone else threw around their credentials, mine are that I've been a hobbyist game developer on live, public-facing projects for 16 years now.
I'd be extremely careful of what you actually store in a DB, especially for an MMORPG. Keep in mind, these things are designed to be MASSIVE with thousands of users, and game code has to execute excessively quickly and send a crap-ton of data over the network, not only to the players on their home connections but also between servers on the back-end. You're also going to have to scale out eventually and databases and scaling out are not two things that I feel mix particularly well, particularly when you start sharding into different regions and then adding instance servers to your shards and so on. You end up with a whole lot of servers talking to databases and passing a lot of data, some of which isn't even relevant to the game at all (SQL text going to a SQL server is useless network traffic that you should cut down on).
Here's a suggestion: Limit your SQL database to storing only things that will change as players play the game. Monsters and monster stats will not change. Items and item stats will not change. Quest goals will not change. Don't store these things in a SQL database, instead store them in the code somewhere.
Doing this means that every server that ever lives will always know all of this information without ever having to query a database. Now, you don't store quests at all, you just store accomplishments of the player and the game programatically determines the affects of those quests being completed. You don't waste data transferring information between servers because you're only sending event ID's or something of that nature (you can optimize the data you pass by only using just enough bits to represent all the event ID's and this will cut down on network traffic. May seem insignificant but nothing is insignificant in massive network apps).
Do the same thing for monster stats and item stats. These things don't change during gameplay so there's no need to keep them in a DB at all and therefore this information NEVER needs to travel over the network. The only thing you store is the ID of the items or monster kills or anything like that which is non-deterministic (i.e. it can change during gameplay in a way which you can't predict). You can have dedicated item servers or monster stat servers or something like that and you can add those to your shards if you end up having huge numbers of these things that occupy too much memory, then just pass the data that's necessary for a particular quest or area to the instance server that is handling that thing to cut down further on space, but keep in mind that this will up the amount of data you need to pass down the network to spool up a new instance server so it's a trade-off. As long as you're aware of the consequences of this trade-off, you can use good judgement and decide what you want to do. Another possibility is to limit instance servers to a particular quest/region/event/whatever and only equip it with enough information to the thing it's responsible for, but this is more complex and potentially limits your scaling out since resource allocation will become static instead of dynamic (if you have 50 servers of each quest and suddenly everyone goes on the same quest, you'll have 49 idle servers and one really swamped server). Again, it's a trade-off so be sure you understand it and make good choices for your application.
Once you've identified exactly what information in your game is non-deterministic, then you can design a database around that information. That becomes a bit easier: players have stats, players have items, players have skills, players have accomplishments, etc, all fairly easy to map out. You don't need descriptions for things like skills, accomplishments, items, etc, or even their effects or names or anything since the server can determine all that stuff for you from the ID's of those things at runtime without needing a database query.
Now, a lot of this probably sounds like overkill to you. After all, a good database can do queries very rapidly. However, your bandwidth is extremely precious, even in the data center, so you need to limit your use of it to only what is absolutely necessary to send and only send that data when it's absolutely necessary that it be sent.
Now, for representing quests in code, I would consider the specification pattern (http://en.wikipedia.org/wiki/Specification_pattern). This will allow you to easily build up quest goals in terms of what events are needed to ensure that the specification for completing that quest is met. You can then use LUA (or something) to define your quests as you build the game so that you don't have to make massive code changes and rebuild the whole damn thing to make it so that you have to kill 11 monsters instead of 10 to get the Sword of 1000 truths in a particular quest. How to actually do something like that I think is beyond the scope of this answer and starts to hit the edge of my knowledge of game programming so maybe someone else on here can help you out if you choose to go that route.
Also, I know I used a lot of terms in this answer, please ask if there are any that you are unfamiliar with and I can explain them.
Edit: didn't notice your addition about craftable items. I'm going to assume that these are things that a player can create specifically in the game, like custom items. If a player can continually change these items, then you can just combine the attributes of what they're crafted as at runtime but you'll need to store the ID of each attribute in the DB somewhere. If you make a finite number of things you can add on (like gems in Diablo II) then you can eliminate a join by just adding that number of columns to the table. If there are a finite number of items that can be crafted and a finite number of ways that differnet things can be joined together into new items, then when certain items are combined, you needn't store the combined attributes; it just becomes a new item which has been defined at some point by you already. Then, they just have that item instead of its components. If you clarify the behavior your game is to have I can add additional suggestions if that would be useful.
I would approach this from an Object Oriented point of view, rather than a Data Centric point of view. It looks like you might have quite a lot of (poss complex) objects - I would recommend getting them modeled (with their relationships) first, and relying on an ORM for persistence.
When you have a data-centric problem, the database is your friend. What you have done so far seems to be quite right.
On the other hand, the other problems you mention seem to be behaviour-centric. In this case, an object-oriented analisys and solution will work better.
For example:
Create a quest class with specificQuest child classes. Each child should implement a bool HasRequirements(Player player) method.
Another option is some sort of rules engine (Drools, for example if you are using Java).
If i was designing a database for such a situation, i might do something like this:
Quest
[quest properties like name and description]
reqItemsID
reqSkillsID
reqPlayerTypesID
RequiredItems
ID
item
RequiredSkills
ID
skill
RequiredPlayerTypes
ID
type
In this, the ID's map to the respective tables then you retrieve all entries under that ID to get the list of required items, skills, what have you. If you allow dynamic creation of items then you should have a mapping to another table that contains all possible items.
Another thing to keep in mind is normalization. There's a long article here but i've condensed the first three levels into the following more or less:
first normal form means that there are no database entries where a specific field has more than one item in it
second normal form means that if you have a composite primary key all other fields are fully dependent on the entire key not just parts of it in each table
third normal is where you have no non-key fields that are dependent on other non-key fields in any table
[Disclaimer: i have very little experience with SQL databases, and am new to this field. I just hope i'm of help.]
I've done something sort of similar and my general solution was to use a lot of meta data. I'm using the term loosely to mean that any time I needed new data to make a given decision(allow a quest, allow using an item etc.) I would create a new attribute. This was basically just a table with an arbitrary number of values and descriptions. Then each character would have a list of these types of attributes.
Ex: List of Kills, Level, Regions visited, etc.
The two things this does to your dev process are:
1) Every time there's an event in the game you need to have a big old switch block that checks all these attribute types to see if something needs updating
2) Everytime you need some data, check all your attribute tables BEFORE you add a new one.
I found this to be a good rapid development strategy for a game that grows organically(not completely planned out on paper ahead of time) - but it's one big limitation is that your past/current content(levels/events etc) will not be compatible with future attributes - i.e. that map won't give you a region badge because there were no region badges when you coded it. This of course requires you to update past content when new attributes are added to the system.
just some little points for your consideration :
1) Always Try to make your "get quest" requirements simple.. and "Finish quest" requirements complicated..
Part1 can be done by "trying to make your quests in a Hierarchical order":
example :
QuestA : (Kill Raven the demon) (quest req: Lvl1)
QuestA.1 : Save "unkown" in the forest to obtain some info.. (quest req : QuestA)
QuestA.2 : Craft the sword of Crystal ... etc.. (quest req : QuestA.1 == Done)
QuestA.3 : ... etc.. (quest req : QuestA.2 == Done)
QuestA.4 : ... etc.. (quest req : QuestA.3 == Done)
etc...
QuestB (Find the lost tomb) (quest req : ( QuestA.statues == Done) )
QuestC (Go To the demons Hypermarket) ( Quest req: ( QuestA.statues == Done && player.level== 10)
etc....
Doing this would save you lots of data fields/table joints.
ADDITIONAL THOUGHTS:
if you use the above system, u can add an extra Reward field to ur quest table called "enableQuests" and add the name of the quests that needs to be enabled..
Logically.. you'd have an "enabled" field assigned to each quest..
2) A minor solution for Your crafting problem, create crafting recipes, Items that contains To-be-Crafted-item crafting requirements stored in them..
so when a player tries to craft an item.. he needs to buy a recipe 1st.. then try crafting..
a simple example of such item Desc would be:
ItemName: "Legendary Sword of the dead"
Craftevel req. : 75
Items required:
Item_1 : Blade of the dead
Item_2 : A cursed seal
item_3 : Holy Gemstone of the dead
etc...
and when he presses the "craft" Action, you can parse it and compare against his inventory/craft box...
so Your Crafting DB will have only 1 field (or 2 if u want to add a crafting LvL req. , though it will already be included in the recipe.
ADDITIONAL THOUGHTS:
Such items, can be stored in xml format in the table .. which would make it much easier to parse...
3) A similar XML System can be applied to Your quest system.. to implement quest-ending requirements..

Good open-source examples of using entity groups in App Engine? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I know all details about how entity groups work in GAE's storage, but yesterday (at the App Engine meetup in Palo Alto), as a presenter was explaining his use of entity groups, it struck me that I've never really made use of them in my own GAE apps, and I don't recall seeing them used in open-source GAE apps I've used.
So, I suspect I've just been overlooking (not noticing or remembering) such examples because I'm simply not used to them enough to immediately connect "use of entity group" to "kind of application problems being solved" -- and I think I should remedy that by studying such sources with this goal in mind, focusing on what problem the EG use is solving (i.e., why the app works with it, but wouldn't work or wouldn't work well without it).
Can anybody suggest good URLs to such code? (Essays would also be welcome, if they focus on application-level problem solving, but not if, like most I've seen, they just focus on the details of how EGs work!-).
The main use of entity groups is to provide the means to update more than one entity in a transaction.
If you haven't had to use them, count your blessings. Either you have been designing your data models such that no two entities ever need to be updated at the same time in order to remain consistent, or else you do need them but you've gotten lucky :)
Imagine that I have an Invoice entity type, and a LineItem entity type. One Invoice can have multiple LineItems associated with it. My Invoice entity has a field called LastUpdated. Any time a LineItem gets added to my Invoice, I want to store the current date in the LastUpdated field.
My update function might look like this (pseudocode)
invoice.lastUpdated = now()
lineitem = new lineitem()
invoice.put()
lineitem.put()
What happens if the invoice put() succeeds and the lineitem put() fails? My invoice date will show that something was updated, but the actual update (the new LineItem) wouldn't be there. The solution is to put both puts() inside a transaction.
An alternative solution would be to use a query to find the date of the last inserted LineItem, instead of storing this data in the lastUpdated field. But that would involve fetching both the Invoice and all the LineItems every time you wanted to know the last time a lineitem was added, costing you precious datastore quota.
EDIT TO RESPOND TO POSTER's COMMENTS
Ah. I think I understand your confusion. The above paragraphs establish why transactions are important. But you say you still don't care about Entity groups, because you don't see how they relate to transactions. But if you are using db.run-in-transaction, then you are using entity groups, perhaps without realizing it! Every transaction involves one and only one entity group, and any given transaction can only affect entities belonging to the same group. see here
"All datastore operations in a
transaction must operate on entities
in the same entity group".
What kind of stuff are you doing in your transactions? There are plenty of good reasons to use transactions with just one Entity, which by default is in its own Entity Group. But sometimes you need to keep 2 or more entities in sync, like in my example above. If the Invoice and the LineItem Entities are not in the same entity group, then you could not wrap the modifications to them in a db.run-in-transaction call. So anytime you want to operate on 2 or more entities transactionally you need to first make sure they are in the same group. Hope that makes it more clear why they are useful.
I've used them here. I'm setting my customer object as the parent of the map markers. This creates an entity group for each customer and gives me two advantages:
Getting the markers of a customer is much faster, because they're stored physically with the customer object.(On the same server, probably on the same disk)
I can change the markers for a customer in a transaction. I suspect the reason transactions require all objects that they operate on to be in the same group is because they're stored in the same physical location, which makes it easier to implement a lock on the data.
I've used them here in this simple wiki system. The latest version of a page is always a root entity and past versions have the latest version as ancestor. The copy operation is done in a transaction to keep the version consistency and avoid losing a version in case of concurrency.

Best practices for consistent and comprehensive address storage in a database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Are there any best practices (or even standards) to store addresses in a consistent and comprehensive way in a database ?
To be more specific, I believe at this stage that there are two cases for address storage :
you just need to associate an address to a person, a building or any item (the most common case). Then a flat table with text columns (address1, address2, zip, city) is probably enough. This is not the case I'm interested in.
you want to run statistics on your addresses : how many items in a specific street, or city or... Then you want to avoid misspellings of any sorts, and ensure consistency. My question is about best practices in this specific case : what are the best ways to model a consistent address database ?
A country specific design/solution would be an excellent start.
ANSWER : There does not seem to exist a perfect answer to this question yet, but :
xAL, as suggested by Hank, is the closest thing to a global standard that popped up. It seems to be quite an overkill though, and I am not sure many people would want to implement it in their database...
To start one's own design (for a specific country), Dave's link to the Universal Postal Union (UPU) site is a very good starting point.
As for France, there is a norm (non official, but de facto standard) for addresses, which bears the lovely name of AFNOR XP Z10-011 (french only), and has to be paid for. The UPU description for France is based on this norm.
I happened to find the equivalent norm for Sweden : SS 613401.
At European level, some effort has been made, resulting in the norm EN 14142-1. It is obtainable via CEN national members.
I've been thinking about this myself as well. Here are my loose thoughts so far, and I'm wondering what other people think.
xAL (and its sister that includes personal names, XNAL) is used by both Google and Yahoo's geocoding services, giving it some weight. But since the same address can be described in xAL in many different ways--some more specific than others--then I don't see how xAL itself is an acceptable format for data storage. Some of its field names could be used, however, but in reality the only basic format that can be used among the 16 countries that my company ships to is the following:
enum address-fields
{
name,
company-name,
street-lines[], // up to 4 free-type street lines
county/sublocality,
city/town/district,
state/province/region/territory,
postal-code,
country
}
That's easy enough to map into a single database table, just allowing for NULLs on most of the columns. And it seems that this is how Amazon and a lot of organizations actually store address data. So the question that remains is how should I model this in an object model that is easily used by programmers and by any GUI code. Do we have a base Address type with subclasses for each type of address, such as AmericanAddress, CanadianAddress, GermanAddress, and so forth? Each of these address types would know how to format themselves and optionally would know a little bit about the validation of the fields.
They could also return some type of metadata about each of the fields, such as the following pseudocode data structure:
structure address-field-metadata
{
field-number, // corresponds to the enumeration above
field-index, // the order in which the field is usually displayed
field-name, // a "localized" name; US == "State", CA == "Province", etc
is-applicable, // whether or not the field is even looked at / valid
is-required, // whether or not the field is required
validation-regex, // an optional regex to apply against the field
allowed-values[] // an optional array of specific values the field can be set to
}
In fact, instead of having individual address objects for each country, we could take the slightly less object-oriented approach of having an Address object that eschews .NET properties and uses an AddressStrategy to determine formatting and validation rules:
object address
{
set-field(field-number, field-value),
address-strategy
}
object address-strategy
{
validate-field(field-number, field-value),
cleanse-address(address),
format-address(address, formatting-options)
}
When setting a field, that Address object would invoke the appropriate method on its internal AddressStrategy object.
The reason for using a SetField() method approach rather than properties with getters and setters is so that it is easier for code to actually set these fields in a generic way without resorting to reflection or switch statements.
You can imagine the process going something like this:
GUI code calls a factory method or some such to create an address based on a country. (The country dropdown, then, is the first thing that the customer selects, or has a good guess pre-selected for them based on culture info or IP address.)
GUI calls address.GetMetadata() or a similar method and receives a list of the AddressFieldMetadata structures as described above. It can use this metadata to determine what fields to display (ignoring those with is-applicable set to false), what to label those fields (using the field-name member), display those fields in a particular order, and perform cursory, presentation-level validation on that data (using the is-required, validation-regex, and allowed-values members).
GUI calls the address.SetField() method using the field-number (which corresponds to the enumeration above) and its given values. The Address object or its strategy can then perform some advanced address validation on those fields, invoke address cleaners, etc.
There could be slight variations on the above if we want to make the Address object itself behave like an immutable object once it is created. (Which I will probably try to do, since the Address object is really more like a data structure, and probably will never have any true behavior associated with itself.)
Does any of this make sense? Am I straying too far off of the OOP path? To me, this represents a pretty sensible compromise between being so abstract that implementation is nigh-impossible (xAL) versus being strictly US-biased.
Update 2 years later: I eventually ended up with a system similar to this and wrote about it at my defunct blog.
I feel like this solution is the right balance between legacy data and relational data storage, at least for the e-commerce world.
I'd use an Address table, as you've suggested, and I'd base it on the data tracked by xAL.
In the UK there is a product called PAF from Royal Mail
This gives you a unique key per address - there are hoops to jump through, though.
I basically see 2 choices if you want consistency:
Data cleansing
Basic data table look ups
Ad 1. I work with the SAS System, and SAS Institute offers a tool for data cleansing - this basically performs some checks and validations on your data, and suggests that "Abram Lincoln Road" and "Abraham Lincoln Road" be merged into the same street. I also think it draws on national data bases containing city-postal code matches and so on.
Ad 2. You build up a multiple choice list (ie basic data), and people adding new entries pick from existing entries in your basic data. In your fact table, you store keys to street names instead of the street names themselves. If you detect a spelling error, you just correct it in your basic data, and all instances are corrected with it, through the key relation.
Note that these options don't rule out each other, you can use both approaches at the same time.
In the US, I'd suggest choosing a National Change of Address vendor and model the DB after what they return.
The authorities on how addresses are constructed are generally the postal services, so for a start I would examine the data elements used by the postal services for the major markets you operate in.
See the website of the Universal Postal Union for very specific and detailed information on international postal address formats:http://www.upu.int/post_code/en/postal_addressing_systems_member_countries.shtml
"xAl is the closest thing to a global standard that popped up. It seems to be quite an overkill though, and I am not sure many people would want to implement it in their database..."
This is not a relevant argument. Implementing addresses is not a trivial task if the system needs to be "comprehensive and consistent" (i.e. worldwide). Implementing such a standard is indeed time consuming, but to meet the specified requirement nevertheless mandatory.
normalize your database schema and you'll have the perfect structure for correct consistency. and this is why:
http://weblogs.sqlteam.com/mladenp/archive/2008/09/17/Normalization-for-databases-is-like-Dependency-Injection-for-code.aspx
I asked something quite similar earlier: Dynamic contact information data/design pattern: Is this in any way feasible?.
The short answer: Storing adderres or any kind of contact information in a database is complex. The Extendible Address Language (xAL) link above has some interesting information that is the closest to a standard/best practice that I've come accross...

Resources