Addressing a database as a multi-level associative array - database

I'm writing a web game, where it's very convenient to think about the complete game state as a single arbitrary-level hash.
A couple examples of the game state being updated:
// Defining a mission card that could be issued to a player
$game['consumables']['missions']['id04536']['name'] = "Build a Galleon for Blackbeard";
$game['consumables']['missions']['id04536']['requirements']['lumber'] = 20;
$game['consumables']['missions']['id04536']['requirements']['iron'] = 10;
$game['consumables']['missions']['id04536']['rewards']['blackbeard_standing'] = 5;
// When a player turns in this mission card with its requirements
$game['players']['id3214']['blackbeard_standing'] += 5;
This is a web game, so storing the information in a database makes sense. I need the features of a database: That the game state can be accessed from multiple browsers at the same time, that it's non-volatile and easy to back up, etc.
Essentially, I want syntax as easy as reading-to/writing-from an associative array of arbitrary depth. I need all the functionality of dealing with an associative array: Not just simple reads and writes, the but the ability to run foreach loops against it, and so on. And I need the effect to actually be to perform all reads/writes from a database, not from volatile memory.
I'm personally fond of raw Ruby, but if there's a specific language or framework that gives me this one feature, it would make the rest of this project easy enough to be worth using.

any language, framework? How about python+sqlalchemy+postgresql
python because you can easily create new types that behave for all the world like regular dicts
postgres because it has two particularly interesting types uncommon in other sql databases, we'll get to that in a moment.
sqlalchemy because it can do all of the dirty work of dealing with an rdbms concisely.
Using a sql database like this is awkward, because the normal 'key' you would want in a table has to be a fixed set of columns, so ideally, you'd need a single column for the whole "path" into the deeply nested mapping.
An even more irritating problem is that you seem to want to store a range of different types at the leaves. This is not ideal.
Fortunately, postgres can help us out with both issues, using a TEXT[] for the first, we can have a single column so that every entry can concisely represent the whole path, all the way down the tree. For the second, we can use JSON, which is exactly like it sounds, permitting arbitrary json encodable types, which significantly does include both strings and numbers as in your code example.
Because I am lazy, i'll use sqlalchemy to do most of the work. First, we need a table using the above types:
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects import postgres as pg
Base = declarative_base()
class AssocStorage(Base):
__tablename__ = 'assoc_storage'
key = Column(pg.ARRAY(pg.TEXT, as_tuple=True), primary_key=True)
value = Column(pg.JSON)
That will give us a relational version of a single entry in a deeply nested mapping. We're most of the way there already:
>>> engine = create_engine('postgres:///database')
>>> Base.metadata.create_all(engine) ## beware, extreem lazyness
>>> session = Session(bind=engine) ## also unusually lazy, real applications should use `sessionmaker`
>>> session.add(AssocStorage(key=('foo','bar'), value=5))
>>> session.commit()
>>> x = session.query(AssocStorage).get((('foo', 'bar'),))
>>> x.key
(u'foo', u'bar')
>>> x.value
5
Okay, not too bad, but this is a little more annoying to use, Like i mentioned earlier, the python type system is compliant enough to make this look like a normal dict, we need only give a class that implements the proper protocol:
import collections
class PersistentDictView(collections.MutableMapping):
# a "production grade" version should actually implement these:
__delitem__ = __iter__ = __len__ = NotImplemented
def __init__(self, session):
self.session = session
def __getitem__(self, key):
return self.session.query(AssocStorage).get((key, )).value
def __setitem__(self, key, value):
existing_item = self.session.query(AssocStorage).get((key, ))
if existing_item is None:
existing_item = AssocStorage(key=key)
self.session.add(existing_item)
existing_item.value = value
This is slightly different from the code you've posted, where you have x[a][b][c], this requires x[a, b, c].
>>> d = PersistentDictView(session)
>>> d['foo', 'bar', 'baz'] = 5
>>> d['foo', 'bar', 'baz']
5
>>> d['foo', 'bar', 'baz'] += 5
>>> session.commit()
>>> d['foo', 'bar', 'baz']
10
If you really need your nesting for some reason, you could get that behavior with only a bit more work, but it would require a little more effort. Additionally, this totally punts on transaction management, notice the explicit session.commit() above.

MongoDB(+its Ruby gem) was the best answer I found. And so easy to install and learn.
It gets me 90% of the way to treating the entire game state as a multi-level hash. Close enough. I can treat the collections as the first hash level, and use it to store & retrieve multilevel hashes.

Related

How to Create Relationship between two Different Column in Neo4j

I am trying to initiate a relationship between two columns in Neo4j. my dataset is a CSV file with two-column refers to Co-Authorship and I want to Construct a Network of it. I already load the data, return them and match them.
Loading
load csv from 'file:///conet1.csv' as rec
return the data
create (:Guys {source: rec[0], target: rec[1]})
now I need to Construct the Collaboration Network of data by making a relationship between source and target columns. What do you propose for the purpose?
I was able to make a relationship between mentioned columns in NetworkX graph libray in python like this:
import pandas as pd
import networkx as nx
g = nx.Graph()
df = pd.read_excel('Colab.csv', columns= ['source', 'target'])
g = nx.from_pandas_edgelist(df,'source','target', 'weight')
If I understand your use case, I do not believe you should be creating Guys nodes just to store relationship info. Instead, the graph-oriented approach would be to create an Author node for each author and a relationship (say, of type COLLABORATED_WITH) between the co-authors.
This might work for you, or at least give you a clue:
LOAD CSV FROM 'file:///conet1.csv' AS rec
MERGE (source:Author {id: rec[0]})
MERGE (target:Author {id: rec[1]})
CREATE (source)-[:COLLABORATED_WITH]->(target)
If it is possible that the same relationship could be re-created, you should replace the CREATE with a more expensive MERGE. Also, a work can have any number of co-authors, so having a relationship between every pair may be sub-optimal depending on what you are trying to do; but that is a separate issue.

Neo4j or MongoDB for relative attributes/relations

I wish to build a database of objects with various types of relations between them. It will be easier for me to explain by giving an example.
I wish to have a set of objects, each is described by a unique name and a set of attributes (say, height, weight, colour, etc.), but instead of values, these attributes may contain values which are relative to other objects. For example, I might have two objects, A and B, where A has height 1 and weight "weight of B + 2", and B has height "height of A + 3" and weight 4.
Some objects may have completely other attributes; for example, object C may represent a box, and objects A and B will be related to C by the relations "I appear x times in C".
Queries may include "what is the height of A/B" or what is the total weight of objects appearing in C with multiplicities.
I am a bit familiar with MongoDB, and fond of its simplicity. I heard of Neo4j, but never tried working with it. From its description, it sounds more suitable for my need (but I can't tell it is capable of the task). But is MongoDB, with its simplicity, suitable as well? Or perhaps a different database engine?
I am not sure it matters, but I plan to use python as the engine which processes the queries and their outputs.
Either can do this. I tend to prefer neo4j, but either way could work.
In neo4j, you'd create a graph consisting of a node (A) and its "base" (B). You could then connect them like this:
(A:Node { weight: "base+2" })-[:base]->(B:Node { weight: 2 })
Note that modeling in this way would make it possible to change the base relationship to point to another node without changing anything about A. The downside is that you'd need a mini calculator to expand expressions like "base+2", which is easy but in any case extra work.
Interpreting your question another way, you're in a situation where you'd probably want a trigger. Here's an article on neo4j triggers, how graphs handle this. If parsing that expression "base+2" at read time isn't what you want, and you want to actually set the value on A to be b.weight + 2, then you want a trigger. This would let you define some other function to be run when the graph gets updated in a certain way. In this case, when someone inserts a new :base relationship in the graph, you might check the base value (endpoint of the relationship) and add 2 to its weight, and set that new property value on the source of the relationship.
Yes, you can use either DBMS.
To help you decide, this is an example of how to support your uses cases in neo4j.
To create your sample data:
CREATE
(a:Foo {id: 'A'}), (b:Foo {id: 'B'}), (c:Box {id: 123}),
(h1:Height {value: 1}), (w4:Weight {value: 4}),
(a)-[:HAS_HEIGHT {factor: 1, offset: 0}]->(h1),
(a)-[:HAS_WEIGHT {factor: 1, offset: 2}]->(w4),
(b)-[:HAS_WEIGHT {factor: 1, offset: 0}]->(w4),
(b)-[:HAS_HEIGHT {factor: 1, offset: 3}]->(h1),
(c)-[:CONTAINS {count: 5}]->(a),
(c)-[:CONTAINS {count: 2}]->(b);
"A" and "B" are represented by Foo nodes, and "C" by a Box node. Since a given height or weight can be referenced by multiple nodes, this example data model uses shared Weight and Height nodes. The HAS_HEIGHT and HAS_WEIGHT relationships have factor and offset properties to allow adjustment of the height or weight for a particular Foo node.
To query "What is the height of A":
MATCH (:Foo {id: 'A'})-[ra:HAS_HEIGHT]->(ha:Height)
RETURN ra.factor * ha.value + ra.offset AS height;
To query "What is the ratio of the heights of A and B":
MATCH
(:Foo {id: 'A'})-[ra:HAS_HEIGHT]->(ha:Height),
(:Foo {id: 'B'})-[rb:HAS_HEIGHT]->(hb:Height)
RETURN
TOFLOAT(ra.factor * ha.value + ra.offset) /
(rb.factor * hb.value + rb.offset) AS ratio;
Note: TOFLOAT() is used above to make sure integer division, which would truncate, is never used.
To query "What is the total weight of objects appearing in C":
MATCH (:Box {id: 123})-[c:CONTAINS]->()-[rx:HAS_WEIGHT]->(wx:Weight)
RETURN SUM(c.count * (rx.factor * wx.value + rx.offset));
I have not used Mongo and decided not to after studying it. So filter my opinion with that in mind; users may find my comments easy to overcome. Mongo is not a true graph database. The user must create and manage the relationships. In Neo4j, relationships are "native" and robust.
There is a head to head comparison at this site:
[https://db-engines.com/en/system/MongoDB%3bNeo4j]
see also: https://stackoverflow.com/questions/10688745/database-for-efficient-large-scale-graph-traversal
There is a distinction between NoSQL (e.g., Mongo) and a true graph database. Many seem to assume that if it is not SQL then it's a graph database. This is not true. Most NoSQL data bases do not store relationships. The free book describes this in chapter 2.
Personally, I'm sold on Neo4j. It makes relationships, transerving graphs and collecting lists along the path easy and powerful.

Google Datastore - Search Optimization Technique

I am dealing with a real-estate app. A Home will hvae typical properties like Price, Bed Rooms, Bath Rooms, SqFt, Lot size etc. User will search for Homes and such a query will require multiple inequality filters like: Price between x and y, rooms greater than z, bathrooms more than p... etc.
I know that multiple inequality filters are not allowed. I also do not want to perform any filtering in my code and/because I want to be able to use Cursors.
so I have come up with two solutions. I am not sure if these are right - so wonder if gurus can shed some light
Solution 1: I will discretize the values of each attribute and save them in a list-field, then use IN. For example: If there are 3 bed rooms, instead of storing beds=3, I will store beds = [1,2,3]. Now if a user searches for homes with say at least two bedrooms, then instead of writing the filter as beds>2, I will write the filter as "beds IN [2]" - and my home above [1,2,3] will qualify - so so will any home with 2 beds [1,2] or 4 beds [1,2,3,4] and so on
Solution 2: It is similar to the first one but instead of creating a list-property, I will actually add attributed (columns) to the home. So a home with 3 bed rooms will have the following attributed/columns/properties: col-bed-1:true, col-bed-2:true, col-bed-3:true. Now if a user searches for homes with say at least two bedrooms, then instead of writing the filter as beds>2, I will write the filter as "col-bed-2 = true" - and my home will qualify - so will any home with 2 beds, 3 beds, 4 beds and so on
I know both solutions will work, but I want to know:
1. Which one is better both from a performance and google pricing perspective
2. Is there a better solution to do this?
I do almost exactly your use case with a python gae app that lists posts with housing advertisements (similar to craigslist). I wrote it in python and searching with a filter is working and straightforward.
You should choose a language: Python, Java or Go, and then use the Google Search API (that has built-in filtering for equalities or inequalities) and build datastore indexes that you can query using the search API.
For instance, you can use a python class like the following to populate the datastore and then use the Search API.
class Home(db.Model):
address = db.StringProperty(verbose_name='address')
number_of_rooms = db.IntegerProperty()
size = db.FloatProperty()
added = db.DateTimeProperty(verbose_name='added', auto_now_add=True) # readonly
last_modified = db.DateTimeProperty(required=True, auto_now=True)
timestamp = db.DateTimeProperty(auto_now=True) #
image_url = db.URLProperty();
I definitely think that you should avoid storing permutations for several reasons: Permutations can explode in size and makes the code difficult to read. Instead you should do like I did and find examples where someone else has already solved an equal or similar problem.
This appengine demo might help you.

Check which ids in id list already exist in NDB (python)

I have a list of entities I'm loading into my front-end. If I don't these entities yet in my NDB, I load them from another data source. If I do have them in my NDB, I obviously load them from there.
Instead of querying for every key separately to test whether it exists, I'd like to query for the whole list (for efficiency reasons) and find out what IDs exist in the NDB and what don't.
It could return a list of booleans, but any other practical solution is welcome.
Thanks already for your help!
How about doing a ndb.get_multi() with your list, and then comparing the results with your original list to find what you need to retrieve from the other data source? Something like this perhaps...
list_of_ids = [1,2,3 ... ]
# You have to use the keys to query using get_multi() (this is assuming that
# your list of ids are also the key ids in NDB)
keys_list = [ndb.key('DB_Kind', x) for x in list_of_ids]
results = ndb.get_multi(keys_list)
results = [x for x in results if x is not None] # Get rid of any Nones
result_keys = [x.key.id() for x in results]
diff = list(set(list_of_ids) - set(result_keys)) # Get the difference in the lists
# Diff should now have a list of ids that weren't in NDB, and results should have
# a list of the entities that were in NDB.
I can't vouch for the performance of this, but it should be more efficient then querying for each entity one at a time. In my experience using ndb.get_multi() is a huge performance booster, since it cuts down on a huge amount of RPCs. You could likely tweak the code that I posted above, but perhaps it will at least point you in the right direction.

Advice on architecture for storing some trivial values in Django database?

I'd like to store some trivial values for each user in the database, like if the user can see the new comers' banner, the instruction on how to use each of the features etc. The number of values can increase as we come across new ideas.
So I've thought about two solutions for storing these data. Either having a field for each of these values (So the structure of the table will change a few times at least), or have one field for all these types of data, so they're stored as a dictionary in the field (In this case, I'm worried about if it's hurting db performance, I also need to write more logics for parsing the dictionary in string and the way around, and if storing dictionaries in db contradicts with what db does).
models.py
class Instruction(models.Model):
user=models.ForeignKey('auth.User')
can_see_feature_foo_instruction=models.BooleanField()
can_see_feature_bar_instruction=models.BooleanField()
...
or
class Instruction(models.Model):
user=models.ForeignKey('auth.User')
instruction_prefs=models.CharField() #Value will be "{'can_see_foo_inst':True, 'can_see_bar_inst':False, ...}"
Which will be the best solution?
It depends if you need to be able to search on these fields. If so, the text field option is not really suitable, as the individual flags won't be indexed. But if not, then this is a perfectly good way of going about it. You might want to consider storing it as JSON, which is useful as a method of serializing dicts objects to text and getting them back. There are plenty of implementations around of "JSONField" in Django that will take of serializing/deserializing the JSON for you.
Django has a built-in permission system. Try reading this link https://docs.djangoproject.com/en/dev/topics/auth/#permissions
Update
I think if you really want to use an Instruction model. You can use something like a JSONField and use it to store instructions. This way you can do something like instruction.key to access a value. You can try using this. https://github.com/derek-schaefer/django-json-field
You can create model for key value pair of instructions/permissions per user.
E.g.
class Instruction(models.Model):
user=models.ForeignKey('auth.User')
key = models.CharField(max_length=20)
value = models.BooleanField()
Then you can create multiple instances of this for each user depending upon permissions he has.
>>>> instr1 = Instruction()
>>>> instr1.user = user1
>>>> instr1.key = 'can_see_feature_foo'
>>>> instr1.value = True
>>>> instr1.save()
>>>> instr2 = Instruction()
>>>> instr2.user = user1
>>>> instr2.key = 'can_see_feature_bar'
>>>> instr2.value = True
>>>> instr2.save()
....
#To query
>>>> Instructions.filter(user=user1, key='can_see_feature_bar')
If you use a Model with a CharField to store the instruction and a ManyToManyField to the users you can create and assign any number of instructions to any number of users.
class Instruction(models.Model):
user = models.ManyToManyField('auth.User')
instruction = models.CharField() # Value will be a single instruction

Resources