Database Schema for Storing C++ Classes - database

I'm trying to come up with a way of storing the underlying structure of C++ classes in a database, and trying to determine what the best type of database would be/how to lay out information in that database.
C++ classes by themselves as code would be parsed into an AST, which would suggest a tree as a possibile data structure, which would fit well into a graph database. However, I don't think the data could be stored purely as a tree, once you consider that pointers could create a loop. The thought then would be a graph. Being someone primarily familiar with relational databases, I'm not sure what the plausibility of that is. The primary pieces of data that would be needed for a class are:
Class name
Children nodes, including their type and offset within the struct
For children nodes that aren't a primitive type, they would have a relationship to the class. This by itself seems like it would fit well inside of a graph database. The main thing I'm having a hard time envisioning is how things like pointers or arrays would be stored. Would those just have different attributes on the edge between the two classes? Is there some other way of storing this data that I'm missing which would work better?

Related

How to store user defined data structures?

I'm building a mobile application that records information about items and then outputs an automatically generated report.
Each Item may be of various types, each type requires different information to be recorded. The user needs to be able to specify what is to be stored for each type.
Is there a "best" way to store this type of information in a relational database?
My current plan is to have a Type table that maps Types to Attributes that need to be recorded for that Type. Does this sound sensible? I imagine that it may get messy when I come to produce reports from this data.
I guess I need a way of generalising the information that needs to be recorded?
I think I just need some pointers in the right direction.
Thanks!
Only a suggestion, might not be an answer... use JSON and go for no-sql database. Today it is more convenient to operate and play around with data in not strictly relational database format.
That way you can define a model(s), or create you own data structure as mentioned and store it easily as a collection of documents of that model. Also no-sql allows structure changes without obligating you to define entire "column" for all "rows" present there ;)
Check this out about MongoDB and NoSQL explanation.
This is also a beatiful post that i love about data modeling in
NoSQL.

difference between database and data structure?

The major operations we do are insertion,deletion and searching in any kind of data structure,which can also be done using database queries then what is the use of data structure?
which makes data structure unique from database?
Data structure shows how the objects in the problem is modeled and organized.
For example,
Your shopping items are organized linearly into an array;
Your company's org chart is modeled in a tree;
Facebook connections are organized as a huge graph.
The problem which data structure solves is how to model the objects in real world logically so that we can solve the problem in a computational manner.
Database is about how the information is persisted. The data in data structure may be persisted into database if necessary and may not be.
At first it seems like a very stupid question , But really is not . The answers can turn your world upside down . (May be). So lets divide and conquer the question
Data Structure
Loosely speaking,
“ A data structure is a specialized format for organizing and storing data. Any data structure is designed to organize data to suit a specific purpose so that it can be used according to needs, stored normally on RAM” -wikipedia
Database
“A database is an organized collection of data. It is the collection of database objects stored normally on hard disk.” -wikipedia
The Misconceptions
Well to be honest, It is a misconception that term data structure is applied to those structures that live in Ram, Not even close . Yes Normally Data structure resides in Ram(AKA Main Memory) but they can also live in Hard Disk(AKA Secondary Storage) . Stop being so Racist !!!!
The Similarities And the differences
Well database is nothing more than a collection of database objects that are stored in hard disk . A database management System uses a database for its own ends. To make my self perfectly clear, database objects here means schemas , tables, views , indexes , users etc
There are so many database objects . Each object can be implemented using one or more data structures .
For Example a btree/btree+ index is implemented using Btree data structure. A hash based index obviously will use hash table to resolve key to an address.
It is safe to say that Database is a collection of different data structures . These types can vary significantly based on technology and Operating System .
A database is a collection of tables (and possibly Stored Procedures, Functions, Views, etc.)
Let's keep it simple for now, though. Each table has a table structure that defines what can be placed in it. With No-SQL databases, this is different, as they are more loosey-goosey. Again, though, let's keep it simple for now.
A Database might be named anything, such as "Platypus" It can contain many tables, such as "DuckbillsInTheWild" and "DuckbillsInCaptivity" etc.
One of these tables may have the structure:
Name Data Type
-------------------------
ID int
Name VarChar
Weight Float
PoisonToeLength Float
Data structure is the memory and Database is Storage in computer. Data structure is voltile memory and database is non-volatile memory.
Data structure is a logical representation of how your data is organized. while database is a middleware to help you store the data into file system.
Suppose you are working in a social network company. Intuitively, the social network is a graph (data structure). You can apply graph algorithm here to solve practical problems. Meanwhile, the data in this graph must be persisted somewhere, so that you can read/write them. Instead of writing codes to save them into files, graph databases can help you there.
One is logical, the other is middleware (closely related to physical).
IMO, Database is a term to refer to a collection of records which could be anything such as inventory data, user's data etc. Now, how do we store it on a computer is another technical issue. I can use linked list, array, tree, graph etc. to store my records depending on the usecase. Thus, data structure is a technical term given to these structures. In short, database is more of business term or folklore term and data structure is a technical term to refer to the collection of records.
Database is the conceptual view of storing data like excel sheets ,CSV files in data sets which stores data permanent in nature while data structure is the logical representation of storing of data into memory like array,graphs etc stores data into temporary nature wise.
Data Structure is the basic requirement for implementation of Data bases or data store.
the main difference between database and data structure is that database is a collection of data that is stored and managed in permanent memory while data structure is a way of storing and arranging data efficiently in temporary memory.
Overall, data is raw and unprocessed facts.

Modeling for Graph DBs

Coming from as SQL/NoSQL background I am finding it quite challenging to model (efficiently that is) the simplest of exercises on a Graph DB.
While different technologies have limitations and best practices, I am uncertain whether the mindset that I am using while creating the models is the correct one, hence, I am in the need of guidance, advice and/or resources to help me get closer to the right practices.
The initial exercise I have tried is representing a file share entire directory (subfolders and files) in a graph DB. For instance some of the attributes and queries I would like to include are;
The hierarchical structure of the folders
The aggregate size at the current node
Being able to search based on who created a file/folder
Being able to search on file types
This brings me to the following questions
When/Which attributes should be used for edges. Only those on which I intend to search? Only relationships?
Should I wish to extend my graph capabilities, for instance, search on files bigger than X? How does one try to maximize the future capabilities/flexibility of the model so that such changes do not cause massive impacts.
Currently I am exploring InfiniteGraph and TitanDB.
1) The only attribute I can think of to describe an edge in a folder hierarchy is whether it is a contains or contained-by relationship.
(You don't even need that if you decide to consider all your edges one or the other. In your case, it looks like you'll almost always be interrogating descendants to search and to return aggregate size).
This is a lot simpler than a network, or a hierarchy where the edges may be of different types. Think an organization chart that tracks not only who manages whom, but who supports whom, mentors whom, harasses whom, whatever.
2) I'm not familiar with the two databases you mentioned, but Neo4J allows indexes on node properties, so adding an index on file_size should not have much impact. It's also "schema-less," so that you can add attributes on the fly and various nodes may contain different attributes.

Database storage design of large amounts of heterogeneous data

Here is something I've wondered for quite some time, and have not seen a real (good) solution for yet. It's a problem I imagine many games having, and that I can't easily think of how to solve (well). Ideas are welcome, but since this is not a concrete problem, don't bother asking for more details - just make them up! (and explain what you made up).
Ok, so, many games have the concept of (inventory) items, and often, there are hundreds of different kinds of items, all with often very varying data structures - some items are very simple ("a rock"), others can have insane complexity or data behind them ("a book", "a programmed computer chip", "a container with more items"), etc.
Now, programming something like that is easy - just have everything implement an interface, or maybe extend an abstract root item. Since objects in the programming world don't have to look the same on the inside as on the outside, there is really no issue with how much and what kind of private fields any type of item has.
But when it comes to database serialization (binary serialization is of course no problem), you are facing a dilemma: how would you represent that in, say, a typical SQL database ?
Some attempts at a solution that I have seen, none of which I find satisfying:
Binary serialization of the items, the database just holds an ID and a blob.
Pro's: takes like 10 seconds to implement.
Con's: Basically sacrifices every database feature, hard to maintain, near impossible to refactor.
A table per item type.
Pro's: Clean, flexible.
Con's: With a wide variety come hundreds of tables, and every search for an item has to query them all since SQL doesn't have the concept of table/type 'reference'.
One table with a lot of fields that aren't used by every item.
Pro's: takes like 10 seconds to implement, still searchable.
Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
A few tables with a few 'base profiles' for storage where similar items get thrown together and use the same fields for different data.
Pro's: I've got nothing.
Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
What ideas do you have? Have you seen another design that works better or worse?
It depends if you need to sort, filter, count, or analyze those attribute.
If you use EAV, then you will screw yourself nicely. Try doing reports on an EAV schema.
The best option is to use Table Inheritance:
PRODUCT
id pk
type
att1
PRODUCT_X
id pk fk PRODUCT
att2
att3
PRODUCT_Y
id pk fk PRODUCT
att4
att 5
For attributes that you don't need to search/sort/analyze, then use a blob or xml
I have two alternatives for you:
One table for the base type and supplemental tables for each “class” of specialized types.
In this schema, properties common to all “objects” are stored in one table, so you have a unique record for every object in the game. For special types like books, containers, usable items, etc, you have another table for each unique set of properties or relationships those items need. Every special type will therefore be represented by two records: the base object record and the supplemental record in a particular special type table.
PROS: You can use column-based features of your database like custom domains, checks, and xml processing; you can have simpler triggers on certain types; your queries differ exactly at the point of diverging concerns.
CONS: You need two inserts for many objects.
Use a “kind” enum field and a JSONB-like field for the special type data.
This is kind of like your #1 or #3, except with some database help. Postgres added JSONB, giving you an improvement over the old EAV pattern. Other databases have a similar complex field type. In this strategy you roll your own mini schema that you stash in the JSONB field. The kind field declares what you expect to find in that JSONB field.
PROS: You can extract special type data in your queries; can add check constraints and have a simple schema to deal with; you can benefit from indexing even though your data is heterogenous; your queries and inserts are simple.
CONS: Your data types within JSONB-like fields are pretty limited and you have to roll your own validation.
Yes, it is a pain to design database formats like this. I'm designing a notification system and reached the same problem. My notification system is however less complex than yours - the data it holds is at most ids and usernames. My current solution is a mix of 1 and 3 - I serialize data that is different from every notification, and use a column for the 2 usernames (some may have 2 or 1). I shy away from method 2 because I hate that design, but it's probably just me.
However, if you can afford it, I would suggest thinking outside the realm of RDBMS - it sounds like Non-RDBMS (especially key/value storage ones) may be a better fit to store these data, especially if item 1 and item 2 differ from each item a lot.
I'm sure this has been asked here a million times before, but in addition to the options which you have discussed in your question, you can look at EAV schema which is very flexible, but which has its own sets of cons.
Another alternative is database systems which are not relational. There are object databases as well as various key/value stores and document databases.
Typically all these things break down to some extent when you need to query against the flexible attributes. This is kind of an intrinsic problem, however. Conceptually, what does it really mean to query things accurately which are unstructured?
First of all, do you actually need the concurrency, scalability and ACID transactions of a real database? Unless you are building a MMO, your game structures will likely fit in memory anyway, so you can search and otherwise manipulate them there directly. In a scenario like this, the "database" is just a store for serialized objects, and you can replace it with the file system.
If you conclude that you do (need a database), then the key is in figuring out what "atomicity" means from the perspective of the data management.
For example, if a game item has a bunch of attributes, but none of these attributes are manipulated individually at the database level (even though they could well be at the application level), then it can be considered as "atomic" from the data management perspective. OTOH, if the item needs to be searched on some of these attributes, then you'll need a good way to index them in the database, which typically means they'll have to be separate fields.
Once you have identified attributes that should be "visible" versus the attributes that should be "invisible" from the database perspective, serialize the latter to BLOBs (or whatever), then forget about them and concentrate on structuring the former.
That's where the fun starts and you'll probably need to use "all of the above" strategy for reasonable results.
BTW, some databases support "deep" indexes that can go into heterogeneous data structures. For example, take a look at Oracle's XMLIndex, though I doubt you'll use Oracle for a game.
You seem to be trying to solve this for a gaming context, so maybe you could consider a component-based approach.
I have to say that I personally haven't tried this yet, but I've been looking into it for a while and it seems to me something similar could be applied.
The idea would be that all the entities in your game would basically be a bag of components. These components can be Position, Energy or for your inventory case, Collectable, for example. Then, for this Collectable component you can add custom fields such as category, numItems, etc.
When you're going to render the inventory, you can simply query your entity system for items that have the Collectable component.
How can you save this into a DB? You can define the components independently in their own table and then for the entities (each in their own table as well) you would add a "Components" column which would hold an array of IDs referencing these components. These IDs would effectively be like foreign keys, though I'm aware that this is not exactly how you can model things in relational databases, but you get the idea.
Then, when you load the entities and their components at runtime, based on the component being loaded you can set the corresponding flag in their bag of components so that you know which components this entity has, and they'll then become queryable.
Here's an interesting read about component-based entity systems.

Object oriented programming in Graph databases

Graph databases store data as nodes, properties and relations. If I need to retrieve some specific data from an object based upon a query, then I would need to retrieve multiple objects (as the query might have a lot of results).
Consider this simple scenario in object oriented programming in graph-databases:
I have a (graph) database of users, where each user is stored as an object. I need to retrieve a list of users living in a specific place (the place property is stored in the user object). So, how would I do it? I mean unnecessary data will be retrieved every time I need to do something (in this case, the entire user object might need to be retrieved). Isn't functional programming better in graph databases?
This example is just a simple analogy of the above stated question that came to my mind. Don't take it as a benchmark. So, the question remains, How great is object oriented programming in graph-databases?
A graph database is more than just vertices and edges. In most graph databases, such as neo4j, in addition to vertices having an id and edges having a label they have a list of properties. Typically in java based graph databases these properties are limited to java primatives -- everything else needs to be serialized to a string (e.g. dates). This mapping to vertex/edge properties can either be done by hand using methods such as getProperty and setProperty or you can something like Frames, an object mapper that uses the TinkerPop stack.
Each node has attributes that can be mapped to object fields. You can do that manually, or you can use spring-data to do the mapping.
Most graph databases have at least one kind of index for vertices/edges. InfiniteGraph, for instance, supports B-Trees, Lucene (for text) and a distributed, scaleable index type. If you don't have an index on the field that you're trying to use as a filter you'd need to traverse the graph and apply predicates yourself at each step. Hopefully, that would reduce the number of nodes to be traversed.
Blockquote I need to retrieve a list of users living in a specific place (the place property is stored in the user object).
There is a better way. Separate location from user. Instead of having a location as a property, create a node for locations.
So you can have (u:User)-[:LIVES_IN]->(l:Location) type of relationship.
it becomes easier to retrieve a list of users living in a specific place with a simple query:
match(u:User)-[:LIVES_IN]->(l:Location) where l.name = 'New York'.
return u,l.
This will return all users living in New York without having to scan all the properties of each node. It's a faster approach.
Why not use an object-oriented graph database?
InfiniteGraph is a graph database built on top of Objectivity/DB which is an massively scalable, distributed object-oriented database.
InfiniteGraph allows you to define your vertices and edges using a standard object-oriented approach, including inheritance. You can also embed a defined data type as an attribute in another data type definition.
Because InfiniteGraph is object-oriented, it give you access to query capabilities on complex data structures that are not available in the popular graph databases. Consider the following diagram:
In this diagram I create a query that determines the inclusion of the edge based on an evaluation of the set of CallDetail nodes hanging off the Call edge. I might only include the edge in my results if there exists a CallDetail with a particular date or if the sum of the callDurations of all of the CallDetails that occurred between two dates is over from threshold. This is the real power of object-oriented database in solving graph problems: You can support a much more complex data model.
I'm not sure why people have comingled the terms graph database and property graph. A property graph is but one way to implement a graph database, and not particular efficient. InfiniteGraph is a schema-based database and the schema provides several distinct advantages, one of which object placement.
Disclaimer: I am the Director of Field Operation for Objectivity, Inc., maker of InfiniteGraph.

Resources